Differences between generator comprehension expressions

旧街凉风 提交于 2019-12-03 01:51:06

This is what you should be doing:

g = (i for i in range(10))

It's a generator expression. It's equivalent to

def temp(outer):
    for i in outer:
        yield i
g = temp(range(10))

but if you just wanted an iterable with the elements of range(10), you could have done

g = range(10)

You do not need to wrap any of this in a function.

If you're here to learn what code to write, you can stop reading. The rest of this post is a long and technical explanation of why the other code snippets are broken and should not be used, including an explanation of why your timings are broken too.


This:

g = [(yield i) for i in range(10)]

is a broken construct that should have been taken out years ago. 8 years after the problem was originally reported, the process to remove it is finally beginning. Don't do it.

While it's still in the language, on Python 3, it's equivalent to

def temp(outer):
    l = []
    for i in outer:
        l.append((yield i))
    return l
g = temp(range(10))

List comprehensions are supposed to return lists, but because of the yield, this one doesn't. It acts kind of like a generator expression, and it yields the same things as your first snippet, but it builds an unnecessary list and attaches it to the StopIteration raised at the end.

>>> g = [(yield i) for i in range(10)]
>>> [next(g) for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: [None, None, None, None, None, None, None, None, None, None]

This is confusing and a waste of memory. Don't do it. (If you want to know where all those Nones are coming from, read PEP 342.)

On Python 2, g = [(yield i) for i in range(10)] does something entirely different. Python 2 doesn't give list comprehensions their own scope - specifically list comprehensions, not dict or set comprehensions - so the yield is executed by whatever function contains this line. On Python 2, this:

def f():
    g = [(yield i) for i in range(10)]

is equivalent to

def f():
    temp = []
    for i in range(10):
        temp.append((yield i))
    g = temp

making f a generator-based coroutine, in the pre-async sense. Again, if your goal was to get a generator, you've wasted a bunch of time building a pointless list.


This:

g = [(yield from range(10))]

is silly, but none of the blame is on Python this time.

There is no comprehension or genexp here at all. The brackets are not a list comprehension; all the work is done by yield from, and then you build a 1-element list containing the (useless) return value of yield from. Your f3:

def f3():
    g = [(yield from range(10))]

when stripped of the unnecessary list-building, simplifies to

def f3():
    yield from range(10)

or, ignoring all the coroutine support stuff yield from does,

def f3():
    for i in range(10):
        yield i

Your timings are also broken.

In your first timing, f1 and f2 create generator objects that can be used inside those functions, though f2's generator is weird. f3 doesn't do that; f3 is a generator function. f3's body does not run in your timings, and if it did, its g would behave quite unlike the other functions' gs. A timing that would actually be comparable with f1 and f2 would be

def f4():
    g = f3()

In your second timing, f2 doesn't actually run, for the same reason f3 was broken in the previous timing. In your second timing, f2 is not iterating over a generator. Instead, the yield from turns f2 into a generator function itself.

g = [(yield i) for i in range(10)]

This construct accumulates the data that is/may be passed back into the generator through its send() method and returns it via the StopIteration exception when the iteration is exhausted1:

>>> g = [(yield i) for i in range(3)]
>>> next(g)
0
>>> g.send('abc')
1
>>> g.send(123)
2
>>> g.send(4.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: ['abc', 123, 4.5]
>>> #          ^^^^^^^^^^^^^^^^^

No such thing happens with plain generator comprehension:

>>> g = (i for i in range(3))
>>> next(g)
0
>>> g.send('abc')
1
>>> g.send(123)
2
>>> g.send(4.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> 

As for the yield from version - in Python 3.5 (which I am using) it doesn't work outside functions, so the illustration is a little different:

>>> def f(): return [(yield from range(3))]
... 
>>> g = f()
>>> next(g)
0
>>> g.send(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in f
AttributeError: 'range_iterator' object has no attribute 'send'

OK, send() doesn't work for a generator yielding from range() but let's at least see what's at the end of the iteration:

>>> g = f()
>>> next(g)
0
>>> next(g)
1
>>> next(g)
2
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: [None]
>>> #          ^^^^^^

1 Note that even if you don't use the send() method, send(None) is assumed, therefore a generator constructed in this way always uses more memory than plain generator comprehension (since it has to accumulate the results of the yield expression till the end of the iteration):

>>> g = [(yield i) for i in range(3)]
>>> next(g)
0
>>> next(g)
1
>>> next(g)
2
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: [None, None, None]

UPDATE

Regarding the performance differences between the three variants. yield from beats the other two because it eliminates a level of indirection (which, to the best of my understanding, is one of the two main reasons why yield from was introduced). However, in this particular example yield from itself is superfluous - g = [(yield from range(10))] is actually almost identical to g = range(10).

This might not do what you think it does.

def f2():
    for i in [(yield from range(10))]:
        print(i)

Call it:

>>> def f2():
...     for i in [(yield from range(10))]:
...         print(i)
...
>>> f2() #Doesn't print.
<generator object f2 at 0x02C0DF00>
>>> set(f2()) #Prints `None`, because `(yield from range(10))` evaluates to `None`.
None
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Because the yield from is not within a comprehension, it is bound to the f2 function instead of an implicit function, turning f2 into a generator function.


I remembered seeing someone point out that it was not actually iterating, but I can't remember where I saw that. I was testing the code myself when I rediscovered this. I didn't find the source searching through the mailing list post nor the bug tracker thread. If someone finds the source, please tell me or add it to the post itself, so it can be credited.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!