Is it safe to combine 'with' and 'yield' in python?

前端 未结 2 1940
慢半拍i
慢半拍i 2021-02-01 02:02

It\'s a common idiom in python to use context manager to automatically close files:

with open(\'filename\') as my_file:
    # do something with my_file

# my_fil         


        
相关标签:
2条回答
  • 2021-02-01 02:56

    Is it safe to combine 'with' and 'yield' in python?

    I don't think you should do this.

    Let me demonstrate making some files:

    >>> for f in 'abc':
    ...     with open(f, 'w') as _: pass
    

    Convince ourselves that the files are there:

    >>> for f in 'abc': 
    ...     with open(f) as _: pass 
    

    And here's a function that recreates your code:

    def gen_abc():
        for f in 'abc':
            with open(f) as file:
                yield file
    

    Here it looks like you can use the function:

    >>> [f.closed for f in gen_abc()]
    [False, False, False]
    

    But let's create a list comprehension of all of the file objects first:

    >>> l = [f for f in gen_abc()]
    >>> l
    [<_io.TextIOWrapper name='a' mode='r' encoding='cp1252'>, <_io.TextIOWrapper name='b' mode='r' encoding='cp1252'>, <_io.TextIOWrapper name='c' mode='r' encoding='cp1252'>]
    

    And now we see they are all closed:

    >>> c = [f.closed for f in l]
    >>> c
    [True, True, True]
    

    This only works until the generator closes. Then the files are all closed.

    I doubt that is what you want, even if you're using lazy evaluation, your last file will probably be closed before you're done using it.

    0 讨论(0)
  • 2021-02-01 03:07

    You bring up a criticism that has been raised before1. The cleanup in this case is non-deterministic, but it will happen with CPython when the generator gets garbage collected. Your mileage may vary for other python implementations...

    Here's a quick example:

    from __future__ import print_function
    import contextlib
    
    @contextlib.contextmanager
    def manager():
        """Easiest way to get a custom context manager..."""
        try:
            print('Entered')
            yield
        finally:
            print('Closed')
    
    
    def gen():
        """Just a generator with a context manager inside.
    
        When the context is entered, we'll see "Entered" on the console
        and when exited, we'll see "Closed" on the console.
        """
        man = manager()
        with man:
            for i in range(10):
                yield i
    
    
    # Test what happens when we consume a generator.
    list(gen())
    
    def fn():
        g = gen()
        next(g)
        # g.close()
    
    # Test what happens when the generator gets garbage collected inside
    # a function
    print('Start of Function')
    fn()
    print('End of Function')
    
    # Test what happens when a generator gets garbage collected outside
    # a function.  IIRC, this isn't _guaranteed_ to happen in all cases.
    g = gen()
    next(g)
    # g.close()
    print('EOF')
    

    Running this script in CPython, I get:

    $ python ~/sandbox/cm.py
    Entered
    Closed
    Start of Function
    Entered
    Closed
    End of Function
    Entered
    EOF
    Closed
    

    Basically, what we see is that for generators that are exhausted, the context manager cleans up when you expect. For generators that aren't exhausted, the cleanup function runs when the generator is collected by the garbage collector. This happens when the generator goes out of scope (or, IIRC at the next gc.collect cycle at the latest).

    However, doing some quick experiments (e.g. running the above code in pypy), I don't get all of my context managers cleaned up:

    $ pypy --version
    Python 2.7.10 (f3ad1e1e1d62, Aug 28 2015, 09:36:42)
    [PyPy 2.6.1 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
    $ pypy ~/sandbox/cm.py
    Entered
    Closed
    Start of Function
    Entered
    End of Function
    Entered
    EOF
    

    So, the assertion that the context manager's __exit__ will get called for all python implementations is untrue. Likely the misses here are attributable to pypy's garbage collection strategy (which isn't reference counting) and by the time pypy decides to reap the generators, the process is already shutting down and therefore, it doesn't bother with it... In most real-world applications, the generators would probably get reaped and finalized quickly enough that it doesn't actually matter...


    Providing strict guarantees

    If you want to guarantee that your context manager is finalized properly, you should take care to close the generator when you are done with it2. Uncommenting the g.close() lines above gives me deterministic cleanup because a GeneratorExit is raised at the yield statement (which is inside the context manager) and then it's caught/suppressed by the generator...

    $ pypy ~/sandbox/cm.py
    Entered
    Closed
    Start of Function
    Entered
    Closed
    End of Function
    Entered
    Closed
    EOF
    
    $ python3 ~/sandbox/cm.py
    Entered
    Closed
    Start of Function
    Entered
    Closed
    End of Function
    Entered
    Closed
    EOF
    
    $ python ~/sandbox/cm.py
    Entered
    Closed
    Start of Function
    Entered
    Closed
    End of Function
    Entered
    Closed
    EOF
    

    FWIW, this means that you can clean up your generators using contextlib.closing:

    from contextlib import closing
    with closing(gen_function()) as items:
        for item in items:
            pass # Do something useful!
    

    1Most recently, some discussion has revolved around PEP 533 which aims to make iterator cleanup more deterministic.
    2It is perfectly OK to close an already closed and/or consumed generator so you can call it without worrying about the state of the generator.

    0 讨论(0)
提交回复
热议问题