I am reading the Python cookbook at the moment and am currently looking at generators. I\'m finding it hard to get my head round.
As I come from a Java background, i
Note: this post assumes Python 3.x syntax.†
A generator is simply a function which returns an object on which you can call next
, such that for every call it returns some value, until it raises a StopIteration
exception, signaling that all values have been generated. Such an object is called an iterator.
Normal functions return a single value using return
, just like in Java. In Python, however, there is an alternative, called yield
. Using yield
anywhere in a function makes it a generator. Observe this code:
>>> def myGen(n):
... yield n
... yield n + 1
...
>>> g = myGen(6)
>>> next(g)
6
>>> next(g)
7
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
As you can see, myGen(n)
is a function which yields n
and n + 1
. Every call to next yields a single value, until all values have been yielded. for
loops call next
in the background, thus:
>>> for n in myGen(6):
... print(n)
...
6
7
Likewise there are generator expressions, which provide a means to succinctly describe certain common types of generators:
>>> g = (n for n in range(3, 5))
>>> next(g)
3
>>> next(g)
4
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Note that generator expressions are much like list comprehensions:
>>> lc = [n for n in range(3, 5)]
>>> lc
[3, 4]
Observe that a generator object is generated once, but its code is not run all at once. Only calls to next
actually execute (part of) the code. Execution of the code in a generator stops once a yield
statement has been reached, upon which it returns a value. The next call to next
then causes execution to continue in the state in which the generator was left after the last yield
. This is a fundamental difference with regular functions: those always start execution at the "top" and discard their state upon returning a value.
There are more things to be said about this subject. It is e.g. possible to send
data back into a generator (reference). But that is something I suggest you do not look into until you understand the basic concept of a generator.
Now you may ask: why use generators? There are a couple of good reasons:
Generators allow for a natural way to describe infinite streams. Consider for example the Fibonacci numbers:
>>> def fib():
... a, b = 0, 1
... while True:
... yield a
... a, b = b, a + b
...
>>> import itertools
>>> list(itertools.islice(fib(), 10))
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
This code uses itertools.islice to take a finite number of elements from an infinite stream. You are advised to have a good look at the functions in the itertools module, as they are essential tools for writing advanced generators with great ease.
† About Python <=2.6: in the above examples next
is a function which calls the method __next__
on the given object. In Python <=2.6 one uses a slightly different technique, namely o.next()
instead of next(o)
. Python 2.7 has next()
call .next
so you need not use the following in 2.7:
>>> g = (n for n in range(3, 5))
>>> g.next()
3
I like to describe generators, to those with a decent background in programming languages and computing, in terms of stack frames.
In many languages, there is a stack on top of which is the current stack "frame". The stack frame includes space allocated for variables local to the function including the arguments passed in to that function.
When you call a function, the current point of execution (the "program counter" or equivalent) is pushed onto the stack, and a new stack frame is created. Execution then transfers to the beginning of the function being called.
With regular functions, at some point the function returns a value, and the stack is "popped". The function's stack frame is discarded and execution resumes at the previous location.
When a function is a generator, it can return a value without the stack frame being discarded, using the yield statement. The values of local variables and the program counter within the function are preserved. This allows the generator to be resumed at a later time, with execution continuing from the yield statement, and it can execute more code and return another value.
Before Python 2.5 this was all generators did. Python 2.5 added the ability to pass values back in to the generator as well. In doing so, the passed-in value is available as an expression resulting from the yield statement which had temporarily returned control (and a value) from the generator.
The key advantage to generators is that the "state" of the function is preserved, unlike with regular functions where each time the stack frame is discarded, you lose all that "state". A secondary advantage is that some of the function call overhead (creating and deleting stack frames) is avoided, though this is a usually a minor advantage.
Generators could be thought of as shorthand for creating an iterator. They behave like a Java Iterator. Example:
>>> g = (x for x in range(10))
>>> g
<generator object <genexpr> at 0x7fac1c1e6aa0>
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> list(g) # force iterating the rest
[3, 4, 5, 6, 7, 8, 9]
>>> g.next() # iterator is at the end; calling next again will throw
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Hope this helps/is what you are looking for.
Update:
As many other answers are showing, there are different ways to create a generator. You can use the parentheses syntax as in my example above, or you can use yield. Another interesting feature is that generators can be "infinite" -- iterators that don't stop:
>>> def infinite_gen():
... n = 0
... while True:
... yield n
... n = n + 1
...
>>> g = infinite_gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
...
The only thing I can add to Stephan202's answer is a recommendation that you take a look at David Beazley's PyCon '08 presentation "Generator Tricks for Systems Programmers," which is the best single explanation of the how and why of generators that I've seen anywhere. This is the thing that took me from "Python looks kind of fun" to "This is what I've been looking for." It's at http://www.dabeaz.com/generators/.
A generator is effectively a function that returns (data) before it is finished, but it pauses at that point, and you can resume the function at that point.
>>> def myGenerator():
... yield 'These'
... yield 'words'
... yield 'come'
... yield 'one'
... yield 'at'
... yield 'a'
... yield 'time'
>>> myGeneratorInstance = myGenerator()
>>> next(myGeneratorInstance)
These
>>> next(myGeneratorInstance)
words
and so on. The (or one) benefit of generators is that because they deal with data one piece at a time, you can deal with large amounts of data; with lists, excessive memory requirements could become a problem. Generators, just like lists, are iterable, so they can be used in the same ways:
>>> for word in myGeneratorInstance:
... print word
These
words
come
one
at
a
time
Note that generators provide another way to deal with infinity, for example
>>> from time import gmtime, strftime
>>> def myGen():
... while True:
... yield strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())
>>> myGeneratorInstance = myGen()
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:17:15 +0000
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:18:02 +0000
The generator encapsulates an infinite loop, but this isn't a problem because you only get each answer every time you ask for it.
I put up this piece of code which explains 3 key concepts about generators:
def numbers():
for i in range(10):
yield i
gen = numbers() #this line only returns a generator object, it does not run the code defined inside numbers
for i in gen: #we iterate over the generator and the values are printed
print(i)
#the generator is now empty
for i in gen: #so this for block does not print anything
print(i)