Merge of lazy streams (using generators) in Python

前端未结

关注

 2  1704

I\'m playing with functional capacities of Python 3 and I tried to implement classical algorithm for calculating Hamming numbers. That\'s the numbers which have as prime fac

相关标签:

2条回答

时光说笑

2020-12-22 00:34

I will propose this different approach - using Python heapq (min-heapq) with generator (lazy evaluation) (if you don't insist to keep the merge() function)

from heapq import heappush, heappop

def hamming_numbers(n):
    ans = [1]

    last = 0
    count = 0

    while count < n:
        x = heappop(ans)

        if x > last:
            yield x

            last = x
            count += 1

            heappush(ans, 2* x)
            heappush(ans, 3* x)
            heappush(ans, 5* x)


    >>> n  = 20
    >>> print(list(hamming_numbers(20)))
       [1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36]

0 讨论(0)

自闭症患者

2020-12-22 00:37
Your algorithm is incorrect. Your m2, m3, m5 should be scaling hamming_numbers, not integers.

The major problem is this: your merge() calls next() for both its arguments unconditionally, so both get advanced one step. So after producing the first number, e.g. 2 for the m23 generator, on the next invocation it sees its 1st argument as 4(,6,8,...) and 2nd as 6(,9,12,...). The 3 is already gone. So it always pulls both its arguments, and always returns the head of the 1st (test entry at http://ideone.com/doeX2Q).

Calling iter() is totally superfluous, it adds nothing here. When I remove it (http://ideone.com/7tk85h), the program works exactly the same and produces exactly the same (wrong) output. Normally iter() serves to create a lazy iterator object, but its arguments here are already such generators.

There's no need to call iter() in your sieve() as well (http://ideone.com/kYh7Di). sieve() already defines a generator, and filter() in Python 3 creates an iterator from a function and an iterable (generators are iterable). See also e.g. Difference between Python's Generators and Iterators .

We can do the merge like this, instead:
```
def merge(s1, s2):
  x1, x2 = next(s1), next(s2)
  while True:
    if x1 < x2:
        yield x1
        x1 = next(s1)
    elif x1 > x2:
        yield x2
        x2 = next(s2)
    else:
        yield x1
        x1, x2 = next(s1), next(s2)
```
Recursion in itself is non-essential in defining the sieve() function too. In fact it only serves to obscure there an enormous deficiency of that code. Any prime it produces will be tested by all the primes below it in value - but only those below its square root are truly needed. We can fix it quite easily in a non-recursive style (http://ideone.com/Qaycpe):
```
def sieve(s):    # call as: sieve( integers_from(2))
    x = next(s)  
    yield x
    ps = sieve( integers_from(2))           # independent primes supply
    p = next(ps) 
    q = p*p       ; print((p,q))
    while True:
        x = next(s)
        while x<q: 
            yield x
            x = next(s)
        # here x == q
        s = filter(lambda y,p=p: y % p, s)  # filter creation postponed 
        p = next(ps)                        #   until square of p seen in input
        q = p*p 
```
This is now much, much, much more efficient (see also: Explain this chunk of haskell code that outputs a stream of primes ).

Recursive or not, is just a syntactic characteristic of a code. The actual run-time structures are the same - the filter() adaptors being hoisted on top of an input stream - either at the appropriate moments, or way too soon (so we'd end up with way too many of them).
0 讨论(0)
发布评论:

提交评论
- 加载中...