Difference between Python's Generators and Iterators

后端 未结 11 1233
谎友^
谎友^ 2020-11-22 05:20

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.

相关标签:
11条回答
  • 2020-11-22 05:29

    I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.

    Let’s start with the very basic:

    Consider a list,

    l = [1,2,3]
    

    Let’s write an equivalent function:

    def f():
        return [1,2,3]
    

    o/p of print(l): [1,2,3] & o/p of print(f()) : [1,2,3]

    Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.

    Let’s apply iterator on list:

    iter_l = iter(l) # iterator applied explicitly
    

    Let’s make a function iterable, i.e. write an equivalent generator function. In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.

    Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux So the generator function will be:

    def f():
      yield 1 
      yield 2
      yield 3
    
    iter_f = f() # which is iter(f) as iterator is already applied implicitly
    

    So if you have observed, as soon as you made function f a generator, it is already iter(f)

    Now,

    l is the list, after applying iterator method "iter" it becomes, iter(l)

    f is already iter(f), after applying iterator method "iter" it becomes, iter(iter(f)), which is again iter(f)

    It's kinda you are casting int to int(x) which is already int and it will remain int(x).

    For example o/p of :

    print(type(iter(iter(l))))
    

    is

    <class 'list_iterator'>
    

    Never forget this is Python and not C or C++

    Hence the conclusion from above explanation is:

    list l ~= iter(l)

    generator function f == iter(f)

    0 讨论(0)
  • 2020-11-22 05:34

    What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.

    In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.

    A function with yield in it is still a function, that, when called, returns an instance of a generator object:

    def a_function():
        "when called, returns generator object"
        yield
    

    A generator expression also returns a generator:

    a_generator = (i for i in range(0))
    

    For a more in-depth exposition and examples, keep reading.

    A Generator is an Iterator

    Specifically, generator is a subtype of iterator.

    >>> import collections, types
    >>> issubclass(types.GeneratorType, collections.Iterator)
    True
    

    We can create a generator several ways. A very common and simple way to do so is with a function.

    Specifically, a function with yield in it is a function, that, when called, returns a generator:

    >>> def a_function():
            "just a function definition with yield in it"
            yield
    >>> type(a_function)
    <class 'function'>
    >>> a_generator = a_function()  # when called
    >>> type(a_generator)           # returns a generator
    <class 'generator'>
    

    And a generator, again, is an Iterator:

    >>> isinstance(a_generator, collections.Iterator)
    True
    

    An Iterator is an Iterable

    An Iterator is an Iterable,

    >>> issubclass(collections.Iterator, collections.Iterable)
    True
    

    which requires an __iter__ method that returns an Iterator:

    >>> collections.Iterable()
    Traceback (most recent call last):
      File "<pyshell#79>", line 1, in <module>
        collections.Iterable()
    TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
    

    Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:

    >>> all(isinstance(element, collections.Iterable) for element in (
            (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
    True
    

    Iterators require a next or __next__ method

    In Python 2:

    >>> collections.Iterator()
    Traceback (most recent call last):
      File "<pyshell#80>", line 1, in <module>
        collections.Iterator()
    TypeError: Can't instantiate abstract class Iterator with abstract methods next
    

    And in Python 3:

    >>> collections.Iterator()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
    

    We can get the iterators from the built-in objects (or custom objects) with the iter function:

    >>> all(isinstance(iter(element), collections.Iterator) for element in (
            (), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
    True
    

    The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.

    From the documentation

    From the Generator Types section of the Iterator Types section of the Built-in Types documentation:

    Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.

    (Emphasis added.)

    So from this we learn that Generators are a (convenient) type of Iterator.

    Example Iterator Objects

    You might create object that implements the Iterator protocol by creating or extending your own object.

    class Yes(collections.Iterator):
    
        def __init__(self, stop):
            self.x = 0
            self.stop = stop
    
        def __iter__(self):
            return self
    
        def next(self):
            if self.x < self.stop:
                self.x += 1
                return 'yes'
            else:
                # Iterators must raise when done, else considered broken
                raise StopIteration
    
        __next__ = next # Python 3 compatibility
    

    But it's easier to simply use a Generator to do this:

    def yes(stop):
        for _ in range(stop):
            yield 'yes'
    

    Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):

    yes_expr = ('yes' for _ in range(stop))
    

    They can all be used in the same way:

    >>> stop = 4             
    >>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop), 
                                 ('yes' for _ in range(stop))):
    ...     print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
    ...     
    0: yes == yes == yes
    1: yes == yes == yes
    2: yes == yes == yes
    3: yes == yes == yes
    

    Conclusion

    You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.

    However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.

    Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".

    0 讨论(0)
  • 2020-11-22 05:38

    Examples from Ned Batchelder highly recommended for iterators and generators

    A method without generators that do something to even numbers

    def evens(stream):
       them = []
       for n in stream:
          if n % 2 == 0:
             them.append(n)
       return them
    

    while by using a generator

    def evens(stream):
        for n in stream:
            if n % 2 == 0:
                yield n
    
    • We don't need any list nor a return statement
    • Efficient for large/ infinite length stream ... it just walks and yield the value

    Calling the evens method (generator) is as usual

    num = [...]
    for n in evens(num):
       do_smth(n)
    
    • Generator also used to Break double loop

    Iterator

    A book full of pages is an iterable, A bookmark is an iterator

    and this bookmark has nothing to do except to move next

    litr = iter([1,2,3])
    next(litr) ## 1
    next(litr) ## 2
    next(litr) ## 3
    next(litr) ## StopIteration  (Exception) as we got end of the iterator
    

    To use Generator ... we need a function

    To use Iterator ... we need next and iter

    As been said:

    A Generator function returns an iterator object

    The Whole benefit of Iterator:

    Store one element a time in memory

    0 讨论(0)
  • 2020-11-22 05:46

    Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.

    That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.

    For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.

    For control flow matters, generators are an as much important concept as promises: both are abstract and composable.

    0 讨论(0)
  • 2020-11-22 05:47

    It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.

    1. What is difference between iterator and iterable? Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
      • One difference is that iterator has __next__ method, iterable does not.
      • Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself. This can help to distinguish iterator and iterable in practice.
    >>> x = [1, 2, 3]
    >>> dir(x) 
    [... __iter__ ...]
    >>> x_iter = iter(x)
    >>> dir(x_iter)
    [... __iter__ ... __next__ ...]
    >>> type(x_iter)
    list_iterator
    
    1. What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).

    2. How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.

    I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:

    • in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
    • if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
    • we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
    0 讨论(0)
提交回复
热议问题