How to look ahead one element (peek) in a Python generator?

本秂侑毒 提交于 2019-11-26 18:52:43

The Python generator API is one way: You can't push back elements you've read. But you can create a new iterator using the itertools module and prepend the element:

import itertools

gen = iter([1,2,3])
peek = gen.next()
print list(itertools.chain([peek], gen))

For sake of completeness, the more-itertools package (which should probably be part of any Python programmer's toolbox) includes a peekable wrapper that implements this behavior. As the code example in the documentation shows:

>>> p = peekable(xrange(2))
>>> p.peek()
0
>>> p.next()
0
>>> p.peek()
1
>>> p.next()
1

The package is compatible with both Python 2 and 3, even though the documentation shows Python 2 syntax.

Ok - two years too late - but I came across this question, and did not find any of the answers to my satisfaction. Came up with this meta generator:

class Peekorator(object):

    def __init__(self, generator):
        self.empty = False
        self.peek = None
        self.generator = generator
        try:
            self.peek = self.generator.next()
        except StopIteration:
            self.empty = True

    def __iter__(self):
        return self

    def next(self):
        """
        Return the self.peek element, or raise StopIteration
        if empty
        """
        if self.empty:
            raise StopIteration()
        to_return = self.peek
        try:
            self.peek = self.generator.next()
        except StopIteration:
            self.peek = None
            self.empty = True
        return to_return

def simple_iterator():
    for x in range(10):
        yield x*3

pkr = Peekorator(simple_iterator())
for i in pkr:
    print i, pkr.peek, pkr.empty

results in:

0 3 False
3 6 False
6 9 False
9 12 False    
...
24 27 False
27 None False

i.e. you have at any moment during iteration access to the next item in the list.

You can use itertools.tee to produce a lightweight copy of the generator. Then peeking ahead at one copy will not affect the second copy:

import itertools

def process(seq):
    peeker, items = itertools.tee(seq)

    # initial peek ahead
    # so that peeker is one ahead of items
    if next(peeker) == 'STOP':
        return

    for item in items:

        # peek ahead
        if next(peeker) == "STOP":
            return

        # process items
        print(item)

The 'items' generator is unaffected by you molesting 'peeker'. Note that you shouldn't use the original 'seq' after calling 'tee' on it, that will break things.

FWIW, this is the wrong way to solve this problem. Any algorithm that requires you to look 1 item ahead in a generator could alternatively be written to use the current generator item and the previous item. Then you don't have to mangle your use of generators and your code will be much simpler. See my other answer to this question.

>>> gen = iter(range(10))
>>> peek = next(gen)
>>> peek
0
>>> gen = (value for g in ([peek], gen) for value in g)
>>> list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Just for fun, I created an implementation of a lookahead class based on the suggestion by Aaron:

import itertools

class lookahead_chain(object):
    def __init__(self, it):
        self._it = iter(it)

    def __iter__(self):
        return self

    def next(self):
        return next(self._it)

    def peek(self, default=None, _chain=itertools.chain):
        it = self._it
        try:
            v = self._it.next()
            self._it = _chain((v,), it)
            return v
        except StopIteration:
            return default

lookahead = lookahead_chain

With this, the following will work:

>>> t = lookahead(xrange(8))
>>> list(itertools.islice(t, 3))
[0, 1, 2]
>>> t.peek()
3
>>> list(itertools.islice(t, 3))
[3, 4, 5]

With this implementation it is a bad idea to call peek many times in a row...

While looking at the CPython source code I just found a better way which is both shorter and more efficient:

class lookahead_tee(object):
    def __init__(self, it):
        self._it, = itertools.tee(it, 1)

    def __iter__(self):
        return self._it

    def peek(self, default=None):
        try:
            return self._it.__copy__().next()
        except StopIteration:
            return default

lookahead = lookahead_tee

Usage is the same as above but you won't pay a price here to use peek many times in a row. With a few more lines you can also look ahead more than one item in the iterator (up to available RAM).

Instead of using items (i, i+1), where 'i' is the current item and i+1 is the 'peek ahead' version, you should be using (i-1, i), where 'i-1' is the previous version from the generator.

Tweaking your algorithm this way will produce something that is identical to what you currently have, apart from the extra needless complexity of trying to 'peek ahead'.

Peeking ahead is a mistake, and you should not be doing it.

This will work -- it buffers an item and calls a function with each item and the next item in the sequence.

Your requirements are murky on what happens at the end of the sequence. What does "look ahead" mean when you're at the last one?

def process_with_lookahead( iterable, aFunction ):
    prev= iterable.next()
    for item in iterable:
        aFunction( prev, item )
        prev= item
    aFunction( item, None )

def someLookaheadFunction( item, next_item ):
    print item, next_item

A simple solution is to use a function like this:

def peek(it):
    first = next(it)
    return first, itertools.chain([first], it)

Then you can do:

>>> it = iter(range(10))
>>> x, it = peek(it)
>>> x
0
>>> next(it)
0
>>> next(it)
1
nilo

If anybody is interested, and please correct me if I am wrong, but I believe it is pretty easy to add some push back functionality to any iterator.

class Back_pushable_iterator:
    """Class whose constructor takes an iterator as its only parameter, and
    returns an iterator that behaves in the same way, with added push back
    functionality.

    The idea is to be able to push back elements that need to be retrieved once
    more with the iterator semantics. This is particularly useful to implement
    LL(k) parsers that need k tokens of lookahead. Lookahead or push back is
    really a matter of perspective. The pushing back strategy allows a clean
    parser implementation based on recursive parser functions.

    The invoker of this class takes care of storing the elements that should be
    pushed back. A consequence of this is that any elements can be "pushed
    back", even elements that have never been retrieved from the iterator.
    The elements that are pushed back are then retrieved through the iterator
    interface in a LIFO-manner (as should logically be expected).

    This class works for any iterator but is especially meaningful for a
    generator iterator, which offers no obvious push back ability.

    In the LL(k) case mentioned above, the tokenizer can be implemented by a
    standard generator function (clean and simple), that is completed by this
    class for the needs of the actual parser.
    """
    def __init__(self, iterator):
        self.iterator = iterator
        self.pushed_back = []

    def __iter__(self):
        return self

    def __next__(self):
        if self.pushed_back:
            return self.pushed_back.pop()
        else:
            return next(self.iterator)

    def push_back(self, element):
        self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10))

x = next(it) # 0
print(x)
it.push_back(x)
x = next(it) # 0
print(x)
x = next(it) # 1
print(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
it.push_back(y)
it.push_back(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)

for x in it:
    print(x) # 4-9

Although itertools.chain() is the natural tool for the job here, beware of loops like this:

for elem in gen:
    ...
    peek = next(gen)
    gen = itertools.chain([peek], gen)

...Because this will consume a linearly growing amount of memory, and eventually grind to a halt. (This code essentially seems to create a linked list, one node per chain() call.) I know this not because I inspected the libs but because this just resulted in a major slowdown of my program - getting rid of the gen = itertools.chain([peek], gen) line sped it up again. (Python 3.3)

Python3 snippet for @jonathan-hartley answer:

def peek(iterator, eoi=None):
    iterator = iter(iterator)

    try:
        prev = next(iterator)
    except StopIteration:
        return iterator

    for elm in iterator:
        yield prev, elm
        prev = elm

    yield prev, eoi


for curr, nxt in peek(range(10)):
    print((curr, nxt))

# (0, 1)
# (1, 2)
# (2, 3)
# (3, 4)
# (4, 5)
# (5, 6)
# (6, 7)
# (7, 8)
# (8, 9)
# (9, None)

It'd be straightforward to create a class that does this on __iter__ and yields just the prev item and put the elm in some attribute.

w.r.t @David Z's post, the newer seekable tool can reset a wrapped iterator to a prior position.

>>> s = mit.seekable(range(3))
>>> s.next()
# 0

>>> s.seek(0)                                              # reset iterator
>>> s.next()
# 0

>>> s.next()
# 1

>>> s.seek(1)
>>> s.next()
# 1

>>> next(s)
# 2

cytoolz has a peek function.

>> from cytoolz import peek
>> gen = iter([1,2,3])
>> first, continuation = peek(gen)
>> first
1
>> list(continuation)
[1, 2, 3]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!