Reusing generator expressions

问题

Generator expressions is an extremely useful tool, and has a huge advantage over list comprehensions, which is the fact that it does not allocate memory for a new array.

The problem I am facing with generator expressions, which eventually makes me end up writing list comprehensions, is that I can only use a such a generator once:

>>> names = ['John', 'George', 'Paul', 'Ringo']
>>> has_o = (name for name in names if 'o' in name)
>>> for name in has_o:
...   print(name.upper())
...
JOHN
GEORGE
RINGO
>>> for name in has_o:
...   print(name.lower())
...
>>>

The above code illustrates how the generator expression can only be used once. That's of course, because the generator expression returns an instance of the generator, rather than defining a generator function which could be instantiated again and again.

Is there a way to clone the generator each time it is used, in order to make it reusable, or to make the generator expressions syntax return a generator function rather than a single instance?

回答1:

Make it a lambda:

has_o = lambda names: (name for name in names if 'o' in name)
for name in has_o(["hello","rrrrr"]):
   print(name.upper())
for name in has_o(["hello","rrrrr"]):
   print(name.upper())

lambda is a one-liner and returns a new generator each time. Here I chose to be able to pass the input list, but if it's fixed, you don't even need a parameter:

names = ["hello","rrrrr"]
has_o = lambda: (name for name in names if 'o' in name)
for name in has_o():
   print(name.upper())
for name in has_o():
   print(name.upper())

In that last case, be careful about the fact that if names changes or is reassigned, the lambda uses the new names object. You can fix the name reassigning by using the default value trick:

has_o = lambda lst=names: (name for name in lst if 'o' in name)

and you can fix the afterwards modification of names by using the default value-and-copy trick (not super-useful when you think your first goal was to avoid a list to be created :)):

has_o = lambda lst=names[:]: (name for name in lst if 'o' in name)

(now make your pick :))

回答2:

itertools.tee allows you to make several iterators out of one iterable:

from itertools import tee

names = ['John', 'George', 'Paul', 'Ringo']
has_o_1, has_o_2 = tee((name for name in names if 'o' in name), 2)
print('iterable 1')
for name in has_o_1:
    print(name.upper())
print('iterable 2')
for name in has_o_2:
    print(name.upper())

Output:

iterable 1
JOHN
GEORGE
RINGO
iterable 2
JOHN
GEORGE
RINGO

回答3:

OK people, here is a code that makes your iterator reusable. It resets itself automatically after each iteration so you do not have to worry about anything. How efficient it is, well, it is two method calls (one next() for tee() which in turn calls next() of the iterator itself), and a try-except block extra on top of the original iterator. You have to decide if a tiny speed loss is OK or use lambda to reconstruct the iterator as shown in other answer.



from itertools import tee

class _ReusableIter:
    """
    This class creates a generator object that wraps another generator and makes it reusable
    again after each iteration is finished.
    It makes two "copies" (using tee()) of an original iterator and iterates over the first one.
    The second "copy" is saved for later use.
    After first iteration reaches its end, it makes two "copies" of the saved "copy", and
    the previous iterator is swapped with the new first "copy" which is iterated over while the second "copy" (a "copy" of the old "copy") waits for the
    end of a new iteration, and so on.
    After each iteration, the _ReusableIter() will be ready to be iterated over again.

    If you layer a _ReusableIter() over another _ReusableIter(), the result can lead you into an indefinite loop,
    or provoke some other unpredictable behaviours.
    This is caused by later explained problem with copying instances of _ReusableIter() with tee().
    Use ReusableIterator() factory function to initiate the object.
    It will prevent you from making a new layer over an already _ReusableIter()
    and return that object instead.

    If you use the _ReusableIter() inside nested loops the first loop
    will get the first element, the second the second, and the last nested loop will
    loop over the rest, then as the last loop is done, the iterator will be reset and
    you will enter the infinite loop. So avoid doing that if the mentioned behaviour is not desired.

    It makes no real sense to copy the _ReusableIter() using tee(), but if you think of doing it for some reason, don't.
    tee() will not do a good job and the original iterator will not really be copied.
    What you will get instead is an extra layer over THE SAME _ReusableIter() for every copy returned.

    TODO: A little speed improvement can be achieved here by implementing tee()'s algorithm directly into _ReusableIter()
    and dump the tee() completely.
    """
    def __init__ (self, iterator):
        self.iterator, self.copy = tee(iterator)
        self._next = self.iterator.next

    def reset (self):
        self.iterator, self.copy = tee(self.copy)
        self._next = self.iterator.next

    def next (self):
        try:
            return self._next()
        except StopIteration:
            self.reset()
            raise

    def __iter__ (self):
        return self

def ReusableIter (iterator):
    if isinstance(iterator, _ReusableIter):
        return iterator
    return _ReusableIter(iterator)

Usage:
>>> names = ['John', 'George', 'Paul', 'Ringo']
>>> has_o = ReusableIter(name for name in names if 'o' in name)
>>> for name in has_o:
>>>     print name
John
George
Ringo
>>> # And just use it again:
>>> for name in has_o:
>>>     print name
John
George
Ringo
>>>

来源：https://stackoverflow.com/questions/49447447/reusing-generator-expressions

标签

python

generator

generator-expression