Merge of lazy streams (using generators) in Python

喜夏-厌秋 提交于 2019-11-29 16:33:23
Will Ness

Your algorithm is incorrect. Your m2, m3, m5 should be scaling hamming_numbers, not integers.

The major problem is this: your merge() calls next() for both its arguments unconditionally, so both get advanced one step. So after producing the first number, e.g. 2 for the m23 generator, on the next invocation it sees its 1st argument as 4(,6,8,...) and 2nd as 6(,9,12,...). The 3 is already gone. So it always pulls both its arguments, and always returns the head of the 1st (test entry at http://ideone.com/doeX2Q).

Calling iter() is totally superfluous, it adds nothing here. When I remove it (http://ideone.com/7tk85h), the program works exactly the same and produces exactly the same (wrong) output. Normally iter() serves to create a lazy iterator object, but its arguments here are already such generators.

There's no need to call iter() in your sieve() as well (http://ideone.com/kYh7Di). sieve() already defines a generator, and filter() in Python 3 creates an iterator from a function and an iterable (generators are iterable). See also e.g. Difference between Python's Generators and Iterators .

We can do the merge like this, instead:

def merge(s1, s2):
  x1, x2 = next(s1), next(s2)
  while True:
    if x1 < x2:
        yield x1
        x1 = next(s1)
    elif x1 > x2:
        yield x2
        x2 = next(s2)
    else:
        yield x1
        x1, x2 = next(s1), next(s2)

Recursion in itself is non-essential in defining the sieve() function too. In fact it only serves to obscure there an enormous deficiency of that code. Any prime it produces will be tested by all the primes below it in value - but only those below its square root are truly needed. We can fix it quite easily in a non-recursive style (http://ideone.com/Qaycpe):

def sieve(s):    # call as: sieve( integers_from(2))
    x = next(s)  
    yield x
    ps = sieve( integers_from(2))           # independent primes supply
    p = next(ps) 
    q = p*p       ; print((p,q))
    while True:
        x = next(s)
        while x<q: 
            yield x
            x = next(s)
        # here x == q
        s = filter(lambda y,p=p: y % p, s)  # filter creation postponed 
        p = next(ps)                        #   until square of p seen in input
        q = p*p 

This is now much, much, much more efficient (see also: Explain this chunk of haskell code that outputs a stream of primes ).

Recursive or not, is just a syntactic characteristic of a code. The actual run-time structures are the same - the filter() adaptors being hoisted on top of an input stream - either at the appropriate moments, or way too soon (so we'd end up with way too many of them).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!