How to implement an efficient infinite generator of prime numbers in Python?

后端 未结 13 2253
醉酒成梦
醉酒成梦 2020-11-22 01:50

This is not a homework, I am just curious.

INFINITE is the key word here.

I wish to use it as for p in primes(). I believe that this is a built-

相关标签:
13条回答
  • 2020-11-22 02:22

    Here is a complicated heap-based implementation, which is not much faster than other heap-based implementations (see the speed comparison in another answer of mine), but it uses much less memory.

    This implementation uses two heaps (tu and wv), which contain the same number elements. Each element is an int pair. In order to find all primes up to q**2 (where q is a prime), each heap will contain at most 2*pi(q-1) elements, where pi(x) is the number of positive primes not larger than x. So the total number of integers is at most 4*pi(floor(sqrt(n))). (We could gain a factor on 2 on memory by pushing half as much stuff to the heap, but that would make the algorithm slower.)

    Other dict and heap-based approaches (e.g. erat2b, and heap_prime_gen_squares and heapprimegen) above store about `2*pi(n)' integers, because they extend their heap or dict every time they find a prime. As a comparison: to find the 1_000_000 primes, this implementation stores less than 4141 integers, other implementations store more than 1_000_000 integers.

    import heapq
    
    def heap_prime_gen_smallmem():
        yield 2
        yield 3
        f = 5
        fmar3 = 2
        q = 7
        q6 = 7 * 6
        qmar3 = 4
        tu = [(25, 30), (35, 30)]
        vw = [(25, 30), (35, 30)]
        while True:
            qmar3 += 2   
            if qmar3 == 6:  
                qb = q + 4
                q6b = q6 + 24
                qmar3 = 2
            else:
                qb = q + 2
                q6b = q6 + 12
            if q < tu[0][0]:
                d = q * q
                while f < d:
                    a, b = vw[0]
                    if f < a: 
                        yield f   
                    else:
                        a, b = vw[0]
                        heapq.heapreplace(vw, (a + b, b))
                        a, b = vw[0]
                        while f >= a:
                            heapq.heapreplace(vw, (a + b, b))
                            a, b = vw[0]   
                    fmar3 += 2
                    if fmar3 == 6:
                        f += 4
                        fmar3 = 2
                    else:
                        f += 2
                c = q * qb   
                heapq.heappush(tu, (d, q6))
                heapq.heappush(tu, (c, q6))
                heapq.heappush(vw, (d, q6))
                heapq.heappush(vw, (c, q6))
            else:
                a, b = tu[0]
                heapq.heapreplace(tu, (a + b, b))
                a, b = tu[0]  
                while q >= a:
                    heapq.heapreplace(tu, (a + b, b))
                    a, b = tu[0]
            q = qb
            q6 = q6b
    
    0 讨论(0)
  • 2020-11-22 02:23

    And another answer, more memory-efficient than my erat3 answer here:

    import heapq
    
    def heapprimegen():
        hp= []
        yield 2
        yield 3
        cn= 3
        nn, inc= 3, 6
        while 1:
            while cn < nn:
                yield cn
                heapq.heappush(hp, (3*cn, 2*cn))
                cn+= 2
            cn= nn+2
            nn, inc= heapq.heappushpop(hp, (nn+inc, inc))
    

    It maintains a heap (a list) of prime multiples rather than a dictionary. It loses some speed, obviously.

    0 讨论(0)
  • 2020-11-22 02:25

    For posterity, here's a rewrite of Will Ness's beautiful algorithm for Python 3. Some changes are needed (iterators no longer have .next() methods, but there's a new next() builtin function). Other changes are for fun (using the new yield from <iterable> replaces four yield statements in the original. More are for readability (I'm not a fan of overusing ;-) 1-letter variable names).

    It's significantly faster than the original, but not for algorithmic reasons. The speedup is mostly due to removing the original's add() function, doing that inline instead.

    def psieve():
        import itertools
        yield from (2, 3, 5, 7)
        D = {}
        ps = psieve()
        next(ps)
        p = next(ps)
        assert p == 3
        psq = p*p
        for i in itertools.count(9, 2):
            if i in D:      # composite
                step = D.pop(i)
            elif i < psq:   # prime
                yield i
                continue
            else:           # composite, = p*p
                assert i == psq
                step = 2*p
                p = next(ps)
                psq = p*p
            i += step
            while i in D:
                i += step
            D[i] = step
    
    0 讨论(0)
  • 2020-11-22 02:26

    Another way to do it:

    import itertools
    def primeseq():
        prime = [2]
        num = 0
        yield 2
        for i in itertools.count(3, 2):
            is_prime = True
            for num in prime:
                if i % num == 0:
                    is_prime = False
                    break
                elif num ** 2 > i: 
                    break
            if is_prime:
                prime.append(i)
                yield i
    
    0 讨论(0)
  • 2020-11-22 02:26

    Here is a simple but not terribly slow one using a heap instead of a dict:

    import heapq
    
    def heap_prime_gen_squares(): 
        yield 2  
        yield 3  
        h = [(9, 6)]
        n = 5
        while True:
            a, b = h[0]
            while n < a:
                yield n
                heapq.heappush(h, (n * n, n << 1))
                n += 2
            heapq.heapreplace(h, (a + b, b))  # Replace h[0], which is still (a, b).
    

    My speed measurements of user time for the first 1 million primes (smaller numbers are better):

    • postponed_sieve (dict-based): 8.553s
    • erat2b (dict-based): 9.513s
    • erat2a (dict-based): 10.313s
    • heap_prime_gen_smallmem (heap-based): 23.935s
    • heap_prime_gen_squares (heap-based): 27.302s
    • heapprimegen (dict-based): 145.029s

    So dict-based approaches seem to be the fastest.

    0 讨论(0)
  • 2020-11-22 02:29

    Since the OP asks for an efficient implementation, here's a significant improvement to the active state 2002 code by David Eppstein/Alex Martelli (seen here in his answer): don't record a prime's info in the dictionary until its square is seen among the candidates. Brings space complexity down to below O(sqrt(n)) instead of O(n), for n primes produced ( π(sqrt(n log n)) ~ 2 sqrt(n log n) / log(n log n) ~ 2 sqrt(n / log n) ). Consequently, time complexity is also improved, i.e. it runs faster.

    Creates a "sliding sieve" as a dictionary of current multiples of each base prime (i.e. below the sqrt of the current production point), together with their step values:

    from itertools import count
                                             # ideone.com/aVndFM
    def postponed_sieve():                   # postponed sieve, by Will Ness      
        yield 2; yield 3; yield 5; yield 7;  # original code David Eppstein, 
        sieve = {}                           #   Alex Martelli, ActiveState Recipe 2002
        ps = postponed_sieve()               # a separate base Primes Supply:
        p = next(ps) and next(ps)            # (3) a Prime to add to dict
        q = p*p                              # (9) its sQuare 
        for c in count(9,2):                 # the Candidate
            if c in sieve:               # c's a multiple of some base prime
                s = sieve.pop(c)         #     i.e. a composite ; or
            elif c < q:  
                 yield c                 # a prime
                 continue              
            else:   # (c==q):            # or the next base prime's square:
                s=count(q+2*p,2*p)       #    (9+6, by 6 : 15,21,27,33,...)
                p=next(ps)               #    (5)
                q=p*p                    #    (25)
            for m in s:                  # the next multiple 
                if m not in sieve:       # no duplicates
                    break
            sieve[m] = s                 # original test entry: ideone.com/WFv4f
    

    (the older, original code here was edited to incorporate changes as seen in the answer by Tim Peters, below). see also this for a related discussion.

    Similar 2-3-5-7 wheel-based code runs ~ 2.15x faster (which is very close to the theoretical improvement of 3/2 * 5/4 * 7/6 = 2.1875).

    0 讨论(0)
提交回复
热议问题