What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3]
&
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield
; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes, iter(l)
f is already iter(f), after applying iterator method "iter" it becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__
and a __next__
(next
in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__
method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
next
or __next__
methodIn Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter
function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__
method is called when you attempt to use an object with a for-loop. Then the __next__
method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration
when you have exhausted it, and it cannot be reused at that point.
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s
__iter__()
method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the__iter__()
andnext()
[__next__()
in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield
to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
return
statementCalling the evens
method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Iterator
A book full of pages is an iterable, A bookmark is an iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next
and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
Previous answers missed this addition: a generator has a close
method, while typical iterators don’t. The close
method triggers a StopIteration
exception in the generator, which may be caught in a finally
clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__
method only, typical iterators have a __next__
method only, generators has both an __iter__
and a __next__
and an additional close
.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close
method an a greater composability with yield from
. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable
and iterator protocol
.
iterator
and iterable
?
Conceptually you iterate over iterable
with the help of corresponding iterator
. There are a few differences that can help to distinguish iterator
and iterable
in practice:
iterator
has __next__
method, iterable
does not.__iter__
method. In case of iterable
it returns the corresponding iterator. In case of iterator
it returns itself.
This can help to distinguish iterator
and iterable
in practice.>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables
in python
? list
, string
, range
etc. What are iterators
? enumerate
, zip
, reversed
etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range
and zip
? One of the reasons to do this - range
has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator
ourselves? Theoretically we may implement Iterator Protocol
(see here). We need to write __next__
and __iter__
methods and raise StopIteration
exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators
in python
.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
keras
we have tf.keras.preprocessing.image.ImageDataGenerator
; this class doesn't have __next__
and __iter__
methods; so it's not an iterator (or generator);flow_from_dataframe()
method you'll get DataFrameIterator
that has those methods; but it doesn't implement StopIteration
(which is not common in build-in iterators in python
); in documentation we may read that "A DataFrameIterator
yielding tuples of (x, y)
" - again confusing usage of terminology;Sequence
class in keras
and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__
and __iter__
, rather it's a wrapper around generators (it uses yield
statement);