Making Sieve of Eratosthenes more memory efficient in python?

前端 未结 5 753
心在旅途
心在旅途 2021-01-20 00:41

Sieve of Eratosthenes memory constraint issue

Im currently trying to implement a version of the sieve of eratosthenes for a Kattis problem, however, I am running in

5条回答
  •  伪装坚强ぢ
    2021-01-20 01:24

    This is a very challenging problem indeed. With a maximum possible N of 10^8, using one byte per value results in almost 100 MB of data assuming no overhead whatsoever. Even halving the data by only storing odd numbers will put you very close to 50 MB after overhead is considered.

    This means the solution will have to make use of one or more of a few strategies:

    1. Using a more efficient data type for our array of primality flags. Python lists maintain an array of pointers to each list item (4 bytes each on a 64 bit python). We effectively need raw binary storage, which pretty much only leaves bytearray in standard python.
    2. Using only one bit per value in the sieve instead of an entire byte (Bool technically only needs one bit, but typically uses a full byte).
    3. Sub-dividing to remove even numbers, and possibly also multiples of 3, 5, 7 etc.
    4. Using a segmented sieve

    I initially tried to solve the problem by storing only 1 bit per value in the sieve, and while the memory usage was indeed within the requirements, Python's slow bit manipulation pushed the execution time far too long. It also was rather difficult to figure out the complex indexing to make sure the correct bits were being counted reliably.

    I then implemented the odd numbers only solution using a bytearray and while it was quite a bit faster, the memory was still an issue.

    Bytearray odd numbers implementation:

    class Sieve:
        def __init__(self, n):
            self.not_prime = bytearray(n+1)
            self.not_prime[0] = self.not_prime[1] = 1
            for i in range(2, int(n**.5)+1):
                if self.not_prime[i] == 0:
                    self.not_prime[i*i::i] = [1]*len(self.not_prime[i*i::i])
            self.n_prime = n + 1 - sum(self.not_prime)
            
        def is_prime(self, n):
            return int(not self.not_prime[n])
            
    
    
    def main():
        n, q = map(int, input().split())
        s = Sieve(n)
        print(s.n_prime)
        for _ in range(q):
            i = int(input())
            print(s.is_prime(i))
    
    if __name__ == "__main__":
        main()
    

    Further reduction in memory from this should* make it work.

    EDIT: also removing multiples of 2 and 3 did not seem to be enough memory reduction even though guppy.hpy().heap() seemed to suggest my usage was in fact a bit under 50MB.

提交回复
热议问题