This is rather the inverse of What can you use Python generator functions for?: python generators, generator expressions, and the itertools
module are some of m
In general, don't use a generator when you need list operations, like len(), reversed(), and so on.
There may also be times when you don't want lazy evaluation (e.g. to do all the calculation up front so you can release a resource). In that case, a list expression might be better.
As you mention, "This especially makes sense for large datasets", I think this answers your question.
If your not hitting any walls, performance-wise, you can still stick to lists and standard functions. Then when you run into problems with performance make the switch.
As mentioned by @u0b34a0f6ae in the comments, however, using generators at the start can make it easier for you to scale to larger datasets.
Use a list instead of a generator when:
1) You need to access the data multiple times (i.e. cache the results instead of recomputing them):
for i in outer: # used once, okay to be a generator or return a list
for j in inner: # used multiple times, reusing a list is better
...
2) You need random access (or any access other than forward sequential order):
for i in reversed(data): ... # generators aren't reversible
s[i], s[j] = s[j], s[i] # generators aren't indexable
3) You need to join strings (which requires two passes over the data):
s = ''.join(data) # lists are faster than generators in this use case
4) You are using PyPy which sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.
You should prefer list comprehensions if you need to keep the values around for something else later and the size of your set is not too large.
For example: you are creating a list that you will loop over several times later in your program.
To some extent you can think of generators as a replacement for iteration (loops) vs. list comprehensions as a type of data structure initialization. If you want to keep the data structure then use list comprehensions.
You should never favor zip over izip, range
over xrange
, or list comprehensions over generator comprehensions. In Python 3.0 range
has xrange
-like semantics and zip
has izip
-like semantics.
List comprehensions are actually clearer like list(frob(x) for x in foo)
for those times you need an actual list.
Profile, Profile, Profile.
Profiling your code is the only way to know if what you're doing has any effect at all.
Most usages of xrange, generators, etc are over static size, small datasets. It's only when you get to large datasets that it really makes a difference. range() vs. xrange() is mostly just a matter of making the code look a tiny little bit more ugly, and not losing anything, and maybe gaining something.
Profile, Profile, Profile.