I am reading the Python cookbook at the moment and am currently looking at generators. I\'m finding it hard to get my head round.
As I come from a Java background, i
It helps to make a clear distinction between the function foo, and the generator foo(n):
def foo(n):
yield n
yield n+1
foo is a function. foo(6) is a generator object.
The typical way to use a generator object is in a loop:
for n in foo(6):
print(n)
The loop prints
# 6
# 7
Think of a generator as a resumable function.
yield
behaves like return
in the sense that values that are yielded get "returned" by the generator. Unlike return, however, the next time the generator gets asked for a value, the generator's function, foo, resumes where it left off -- after the last yield statement -- and continues to run until it hits another yield statement.
Behind the scenes, when you call bar=foo(6)
the generator object bar is defined for you to have a next
attribute.
You can call it yourself to retrieve values yielded from foo:
next(bar) # Works in Python 2.6 or Python 3.x
bar.next() # Works in Python 2.5+, but is deprecated. Use next() if possible.
When foo ends (and there are no more yielded values), calling next(bar)
throws a StopInteration error.
I believe the first appearance of iterators and generators were in the Icon programming language, about 20 years ago.
You may enjoy the Icon overview, which lets you wrap your head around them without concentrating on the syntax (since Icon is a language you probably don't know, and Griswold was explaining the benefits of his language to people coming from other languages).
After reading just a few paragraphs there, the utility of generators and iterators might become more apparent.
First of all, the term generator originally was somewhat ill-defined in Python, leading to lots of confusion. You probably mean iterators and iterables (see here). Then in Python there are also generator functions (which return a generator object), generator objects (which are iterators) and generator expressions (which are evaluated to a generator object).
According to the glossary entry for generator it seems that the official terminology is now that generator is short for "generator function". In the past the documentation defined the terms inconsistently, but fortunately this has been fixed.
It might still be a good idea to be precise and avoid the term "generator" without further specification.
This post will use Fibonacci numbers as a tool to build up to explaining the usefulness of Python generators.
This post will feature both C++ and Python code.
Fibonacci numbers are defined as the sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ....
Or in general:
F0 = 0
F1 = 1
Fn = Fn-1 + Fn-2
This can be transferred into a C++ function extremely easily:
size_t Fib(size_t n)
{
//Fib(0) = 0
if(n == 0)
return 0;
//Fib(1) = 1
if(n == 1)
return 1;
//Fib(N) = Fib(N-2) + Fib(N-1)
return Fib(n-2) + Fib(n-1);
}
But if you want to print the first six Fibonacci numbers, you will be recalculating a lot of the values with the above function.
For example: Fib(3) = Fib(2) + Fib(1)
, but Fib(2)
also recalculates Fib(1)
. The higher the value you want to calculate, the worse off you will be.
So one may be tempted to rewrite the above by keeping track of the state in main
.
// Not supported for the first two elements of Fib
size_t GetNextFib(size_t &pp, size_t &p)
{
int result = pp + p;
pp = p;
p = result;
return result;
}
int main(int argc, char *argv[])
{
size_t pp = 0;
size_t p = 1;
std::cout << "0 " << "1 ";
for(size_t i = 0; i <= 4; ++i)
{
size_t fibI = GetNextFib(pp, p);
std::cout << fibI << " ";
}
return 0;
}
But this is very ugly, and it complicates our logic in main
. It would be better to not have to worry about state in our main
function.
We could return a vector
of values and use an iterator
to iterate over that set of values, but this requires a lot of memory all at once for a large number of return values.
So back to our old approach, what happens if we wanted to do something else besides print the numbers? We'd have to copy and paste the whole block of code in main
and change the output statements to whatever else we wanted to do.
And if you copy and paste code, then you should be shot. You don't want to get shot, do you?
To solve these problems, and to avoid getting shot, we may rewrite this block of code using a callback function. Every time a new Fibonacci number is encountered, we would call the callback function.
void GetFibNumbers(size_t max, void(*FoundNewFibCallback)(size_t))
{
if(max-- == 0) return;
FoundNewFibCallback(0);
if(max-- == 0) return;
FoundNewFibCallback(1);
size_t pp = 0;
size_t p = 1;
for(;;)
{
if(max-- == 0) return;
int result = pp + p;
pp = p;
p = result;
FoundNewFibCallback(result);
}
}
void foundNewFib(size_t fibI)
{
std::cout << fibI << " ";
}
int main(int argc, char *argv[])
{
GetFibNumbers(6, foundNewFib);
return 0;
}
This is clearly an improvement, your logic in main
is not as cluttered, and you can do anything you want with the Fibonacci numbers, simply define new callbacks.
But this is still not perfect. What if you wanted to only get the first two Fibonacci numbers, and then do something, then get some more, then do something else?
Well, we could go on like we have been, and we could start adding state again into main
, allowing GetFibNumbers to start from an arbitrary point.
But this will further bloat our code, and it already looks too big for a simple task like printing Fibonacci numbers.
We could implement a producer and consumer model via a couple of threads. But this complicates the code even more.
Instead let's talk about generators.
Python has a very nice language feature that solves problems like these called generators.
A generator allows you to execute a function, stop at an arbitrary point, and then continue again where you left off. Each time returning a value.
Consider the following code that uses a generator:
def fib():
pp, p = 0, 1
while 1:
yield pp
pp, p = p, pp+p
g = fib()
for i in range(6):
g.next()
Which gives us the results:
0 1 1 2 3 5
The yield
statement is used in conjuction with Python generators. It saves the state of the function and returns the yeilded value. The next time you call the next() function on the generator, it will continue where the yield left off.
This is by far more clean than the callback function code. We have cleaner code, smaller code, and not to mention much more functional code (Python allows arbitrarily large integers).
Source
Experience with list comprehensions has shown their widespread utility throughout Python. However, many of the use cases do not need to have a full list created in memory. Instead, they only need to iterate over the elements one at a time.
For instance, the following summation code will build a full list of squares in memory, iterate over those values, and, when the reference is no longer needed, delete the list:
sum([x*x for x in range(10)])
Memory is conserved by using a generator expression instead:
sum(x*x for x in range(10))
Similar benefits are conferred on constructors for container objects:
s = Set(word for line in page for word in line.split())
d = dict( (k, func(k)) for k in keylist)
Generator expressions are especially useful with functions like sum(), min(), and max() that reduce an iterable input to a single value:
max(len(line) for line in file if line.strip())
more
There is no Java equivalent.
Here is a bit of a contrived example:
#! /usr/bin/python
def mygen(n):
x = 0
while x < n:
x = x + 1
if x % 3 == 0:
yield x
for a in mygen(100):
print a
There is a loop in the generator that runs from 0 to n, and if the loop variable is a multiple of 3, it yields the variable.
During each iteration of the for
loop the generator is executed. If it is the first time the generator executes, it starts at the beginning, otherwise it continues from the previous time it yielded.