Stack space overflow when computing primes

后端 未结 3 1322
一整个雨季
一整个雨季 2021-01-06 08:58

I\'m working my way through Real World Haskell (I\'m in chapter 4) and to practice a bit off-book I\'ve created the following program to calculate the nth prime.

<         


        
相关标签:
3条回答
  • 2021-01-06 09:26

    First, you don't have tail recursion here, but guarded recursion, a.k.a. tail recursion modulo cons.

    The reason you're getting a stack overflow is, as others commented, a thunk pile-up. But where? One suggested culprit is your use of (++). While not optimal, the use of (++) not necessarily leads to a thunk pileup and stack overflow. For instance, calling

    take 2 $ filter (isPrime primes) [15485860..]
    

    should produce [15485863,15485867] in no time, and without any stack overflow. But it is still the same code which uses (++), right?

    The problem is, you have two lists you call primes. One (at the top level) is infinite, co-recursively produced through guarded (not tail) recursion. Another (an argument to loop) is a finite list, built by adding each newly found prime to its end, used for testing.

    But when it is used for testing, it is not forced through to its end. If that happened there wouldn't be an SO problem. It is only forced through to the sqrt of a test number. So (++) thunks do pile up past that point.

    When isPrime primes 15485863 is called, it forces the top-level primes up to 3935, which is 547 primes. The internal testing-primes list too consists of 547 primes, of which only first 19 are forced.

    But when you call primes !! 1000000, out of the 1,000,000 primes in the duplicate internal list only 547 are forced. The rest are all in thunks.

    If you were adding new primes to the end of testing-primes list only when their square was seen among the candidates, the testing-primes list would be always forced through completely, or nearly to its end, and there wouldn't be a thunk pileup causing the SO. And appending with (++) to the end of a forced list is not that bad when next access forces that list to its end and leaves no thunks behind. (It still copies the list though.)

    Of course the top-level primes list can be used directly, as Thomas M. DuBuisson shows in his answer.

    But the internal list has its uses. When correctly implemented, adding new primes to it only when their square is seen among the candidates, it may allow your program to run in O(sqrt(n)) space, when compiled with optimizations.

    0 讨论(0)
  • 2021-01-06 09:29

    You should probably check these two questions:

    1. How can I increase the stack size with runhaskell?
    2. How to avoid stack space overflows?
    0 讨论(0)
  • 2021-01-06 09:35

    As I said in my comment, you shouldn't be building a list by appending a single element list to the end of a really long list (your line primes' = primes ++ [test]). It is better to just define the infinite list, primes, and let lazy evaluation do it's thing. Something like the below code:

    primes = [2, 3] ++ loop 5
        where.
            loop test
                | isPrime primes test = test:(loop test')
                | otherwise = test' `seq` (loop test')
                where
                    test' = test + 2
    

    Obviously you don't need to parameterize the isPrime function by primes either, but that's just a nit. Also, when you know all the numbers are positive you should use rem instead of mod - this results in a 30% performance increase on my machine (when finding the millionth prime).

    0 讨论(0)
提交回复
热议问题