Python Garbage Collection sometimes not working in Jupyter Notebook

后端 未结 2 704
深忆病人
深忆病人 2021-02-05 11:26

I\'m constantly running out of RAM with some Jupyter Notebooks and I seem to be unable to release memory that is no longer needed. Here is an example:

import gc
         


        
相关标签:
2条回答
  • 2021-02-05 12:02

    There are a number of issues at play here. The first is that IPython (what Jupyter uses behind the scenes keeps additional references to objects when you see something like Out[67]. In fact you can use that syntax to recall the object and do something with it. eg. str(Out[67]). The second problem is that Jupyter seems to be keeping its own reference of output variables, so only a full reset of IPython will work. But that's not much different to just restarting the notebook.

    There is a solution though! I wrote a function that you can run that will clear all variables, except the ones you explicitly ask to keep.

    def my_reset(*varnames):
        """
        varnames are what you want to keep
        """
        globals_ = globals()
        to_save = {v: globals_[v] for v in varnames}
        to_save['my_reset'] = my_reset  # lets keep this function by default
        del globals_
        get_ipython().magic("reset")
        globals().update(to_save)
    

    You would use it like:

    x = 1
    y = 2
    my_reset('x')
    assert 'y' not in globals()
    assert x == 1
    

    Below I wrote a notebook that shows you a little bit of what is going on behind the scenes and how you can see when something has truly been deleted by using the weakref module. You can try running it to see if it helps you understand what is going on.

    In [1]: class MyObject:
                pass
    
    In [2]: obj = MyObject()
    
    In [3]: # now lets try deleting the object
            # First, create a weak reference to obj, so we can know when it is truly deleted.
            from weakref import ref
            from sys import getrefcount
            r = ref(obj)
            print("the weak reference looks like", r)
            print("it has a reference count of", getrefcount(r()))
            # this prints a ref count of 2 (1 for obj and 1 because getrefcount
            # had a reference to obj)
            del obj
            # since obj was the only strong reference to the object, it should have been 
            # garbage collected now.
            print("the weak reference looks like", r)
    
    the weak reference looks like <weakref at 0x7f29a809d638; to 'MyObject' at 0x7f29a810cf60>
    it has a reference count of 2
    the weak reference looks like <weakref at 0x7f29a809d638; dead>
    
    In [4]: # lets try again, but this time we won't print obj, will just do "obj"
            obj = MyObject()
    
    In [5]: print(getrefcount(obj))
            obj
    
    2
    Out[5]: <__main__.MyObject at 0x7f29a80a0c18>
    
    In [6]: # note the "Out[5]". This is a second reference to our object
            # and will keep it alive if we delete obj
            r = ref(obj)
            del obj
            print("the weak reference looks like", r)
            print("with a reference count of:", getrefcount(r()))
    
    the weak reference looks like <weakref at 0x7f29a809db88; to 'MyObject' at 0x7f29a80a0c18>
    with a reference count of: 7
    
    In [7]: # So what happened? It's that Out[5] that is keeping the object alive.
            # if we clear our Out variables it should go away...
            # As it turns out Juypter keeps a number of its own variables lying around, 
            # so we have to reset pretty everything.
    
    In [8]: def my_reset(*varnames):
                """
                varnames are what you want to keep
                """
                globals_ = globals()
                to_save = {v: globals_[v] for v in varnames}
                to_save['my_reset'] = my_reset  # lets keep this function by default
                del globals_
                get_ipython().magic("reset")
                globals().update(to_save)
    
            my_reset('r') # clear everything except our weak reference to the object
            # you would use this to keep "thing" around.
    
    Once deleted, variables cannot be recovered. Proceed (y/[n])? y
    
    In [9]: print("the weak reference looks like", r)
    
    the weak reference looks like <weakref at 0x7f29a809db88; dead>
    
    0 讨论(0)
  • 2021-02-05 12:06

    I had the same issue, and after many hours of struggle, the solution that worked for me was very lean. You just need to include all your code into a single cell. In the same cell, garbage collection is performed normally, and only after you leave the cell is when the variables have all the extra references and are not collectible.

    For long notebooks, this might be a highly inconvenient and non-readable way, however, the idea is that you can perform garbage collection in a cell for the variables in that cell. So maybe you could organize your code in a way that you can call gc.collect() at the end of the cell before leaving it.

    Hope this helps :)

    0 讨论(0)
提交回复
热议问题