Problem 10 from Project Euler is to find the sum of all the primes below given n.
I solved it simply by summing up the primes generated by the sieve of Eratosth
This code of mine evaluates the sum to 2⋅10^9 in 0.3 seconds and the sum to 10^12 (18435588552550705911377) in 19.6 seconds (if given sufficient RAM).
import Control.DeepSeq
import qualified Control.Monad as ControlMonad
import qualified Data.Array as Array
import qualified Data.Array.ST as ArrayST
import qualified Data.Array.Base as ArrayBase
primeLucy :: (Integer -> Integer) -> (Integer -> Integer) -> Integer -> (Integer->Integer)
primeLucy f sf n = g
where
r = fromIntegral $ integerSquareRoot n
ni = fromIntegral n
loop from to c = let go i = ControlMonad.when (to<=i) (c i >> go (i-1)) in go from
k = ArrayST.runSTArray $ do
k <- ArrayST.newListArray (-r,r) $ force $
[sf (div n (toInteger i)) - sf 1|i<-[r,r-1..1]] ++
[0] ++
[sf (toInteger i) - sf 1|i<-[1..r]]
ControlMonad.forM_ (takeWhile (<=r) primes) $ \p -> do
l <- ArrayST.readArray k (p-1)
let q = force $ f (toInteger p)
let adjust = \i j -> do { v <- ArrayBase.unsafeRead k (i+r); w <- ArrayBase.unsafeRead k (j+r); ArrayBase.unsafeWrite k (i+r) $!! v+q*(l-w) }
loop (-1) (-div r p) $ \i -> adjust i (i*p)
loop (-div r p-1) (-min r (div ni (p*p))) $ \i -> adjust i (div (-ni) (i*p))
loop r (p*p) $ \i -> adjust i (div i p)
return k
g :: Integer -> Integer
g m
| m >= 1 && m <= integerSquareRoot n = k Array.! (fromIntegral m)
| m >= integerSquareRoot n && m <= n && div n (div n m)==m = k Array.! (fromIntegral (negate (div n m)))
| otherwise = error $ "Function not precalculated for value " ++ show m
primeSum :: Integer -> Integer
primeSum n = (primeLucy id (\m -> div (m*m+m) 2) n) n
If your integerSquareRoot
function is buggy (as reportedly some are), you can replace it here with floor . sqrt . fromIntegral
.
Explanation:
As the name suggests it is based upon a generalization of the famous method by "Lucy Hedgehog" eventually discovered by the original poster.
It allows you to calculate many sums of the form (with p prime) without enumerating all the primes up to N and in time O(N^0.75).
Its inputs are the function f (i.e., id
if you want the prime sum), its summatory function over all the integers (i.e., in that case the sum of the first m integers or div (m*m+m) 2
), and N.
PrimeLucy
returns a lookup function (with p prime) restricted to certain values of n: .
I've done some small improvements so it runs in 3.4-3.5
seconds on my machine.
Using IntMap.Strict
helped a lot. Other than that I just manually performed some ghc
optimizations just to be sure. And make Haskell code more close to Python code from your link. As a next step you could try to use some mutable HashMap
. But I'm not sure... IntMap
can't be much faster than some mutable container because it's an immutable one. Though I'm still surprised about it's efficiency. I hope this can be implemented faster.
Here is the code:
import Data.List (foldl')
import Data.IntMap.Strict (IntMap, (!))
import qualified Data.IntMap.Strict as IntMap
p :: Int -> Int
p n = (sieve (IntMap.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n
where vs = [n `div` i | i <- [1..r]] ++ [n', n' - 1 .. 1]
r = floor (sqrt (fromIntegral n) :: Double)
n' = n `div` r - 1
sieve :: IntMap Int -> Int -> Int -> [Int] -> IntMap Int
sieve m' p' r vs = go m' p'
where
go m p | p > r = m
| m ! p > m ! (p - 1) = go (update m vs p) (p + 1)
| otherwise = go m (p + 1)
update :: IntMap Int -> [Int] -> Int -> IntMap Int
update s vs p = foldl' decrease s (takeWhile (>= p2) vs)
where
sp = s ! (p - 1)
p2 = p * p
sumOfSieved v = p * (s ! (v `div` p) - sp)
decrease m v = IntMap.adjust (subtract $ sumOfSieved v) v m
main :: IO ()
main = print $ p $ 2*10^(9 :: Int)
UPDATE:
Using mutable hashtables
I've managed to make performance up to ~5.5sec
on Haskell with this implementation.
Also, I used unboxed vectors instead of lists in several places. Linear
hashing seems to be the fastest. I think this can be done even faster. I noticed sse42 option in hasthables
package. Not sure I've managed to set it correctly but even without it runs that fast.
UPDATE 2 (19.06.2017)
I've managed to make it 3x
faster then best solution from @Krom (using my code + his map) by dropping judy hashmap at all. Instead just plain arrays are used. You can come up with the same idea if you notice that keys for S
hashmap are either sequence from 1
to n'
or n div i
for i
from 1
to r
. So we can represent such HashMap as two arrays making lookups in array depending on searching key.
My code + Judy HashMap
$ time ./judy
95673602693282040
real 0m0.590s
user 0m0.588s
sys 0m0.000s
My code + my sparse map
$ time ./sparse
95673602693282040
real 0m0.203s
user 0m0.196s
sys 0m0.004s
This can be done even faster if instead of IOUArray
already generated vectors and Vector
library is used and readArray
is replaced by unsafeRead
. But I don't think this should be done if only you're not really interested in optimizing this as much as possible.
Comparison with this solution is cheating and is not fair. I expect same ideas implemented in Python and C++ will be even faster. But @Krom solution with closed hashmap is already cheating because it uses custom data structure instead of standard one. At least you can see that standard and most popular hash maps in Haskell are not that fast. Using better algorithms and better ad-hoc data structures can be better for such problems.
Here's resulting code.
First as a baseline, the timings of the existing approaches on my machine:
Original program posted in the question:
time stack exec primorig
95673602693282040
real 0m4.601s
user 0m4.387s
sys 0m0.251s
Second the version using Data.IntMap.Strict
from
here
time stack exec primIntMapStrict
95673602693282040
real 0m2.775s
user 0m2.753s
sys 0m0.052s
Shershs code with Data.Judy dropped in here
time stack exec prim-hash2
95673602693282040
real 0m0.945s
user 0m0.955s
sys 0m0.028s
Your python solution.
I compiled it with
python -O -m py_compile problem10.py
and the timing:
time python __pycache__/problem10.cpython-36.opt-1.pyc
95673602693282040
real 0m1.163s
user 0m1.160s
sys 0m0.003s
Your C++ version:
$ g++ -O2 --std=c++11 p10.cpp -o p10
$ time ./p10
sum(2000000000) = 95673602693282040
real 0m0.314s
user 0m0.310s
sys 0m0.003s
I didn't bother to provide a baseline for slow.hs, as I didn't
want to wait for it to complete when run with an argument of
2*10^9
.
The following program runs in under a second on my machine.
It uses a hand rolled hashmap, which uses closed hashing with linear probing and uses some variant of knuths hashfunction, see here.
Certainly it is somewhat tailored to the case, as the lookup function for example expects the searched keys to be present.
Timings:
time stack exec prim
95673602693282040
real 0m0.725s
user 0m0.714s
sys 0m0.047s
First I implemented my hand rolled hashmap simply to hash the keys with
key `mod` size
and selected a size multiple times higher than the expected input, but the program took 22s or more to complete.
Finally it was a matter of choosing a hash function which was good for the workload.
Here is the program:
import Data.Maybe
import Control.Monad
import Data.Array.IO
import Data.Array.Base (unsafeRead)
type Number = Int
data Map = Map { keys :: IOUArray Int Number
, values :: IOUArray Int Number
, size :: !Int
, factor :: !Int
}
newMap :: Int -> Int -> IO Map
newMap s f = do
k <- newArray (0, s-1) 0
v <- newArray (0, s-1) 0
return $ Map k v s f
storeKey :: IOUArray Int Number -> Int -> Int -> Number -> IO Int
storeKey arr s f key = go ((key * f) `mod` s)
where
go :: Int -> IO Int
go ind = do
v <- readArray arr ind
go2 v ind
go2 v ind
| v == 0 = do { writeArray arr ind key; return ind; }
| v == key = return ind
| otherwise = go ((ind + 1) `mod` s)
loadKey :: IOUArray Int Number -> Int -> Int -> Number -> IO Int
loadKey arr s f key = s `seq` key `seq` go ((key *f) `mod` s)
where
go :: Int -> IO Int
go ix = do
v <- unsafeRead arr ix
if v == key then return ix else go ((ix + 1) `mod` s)
insertIntoMap :: Map -> (Number, Number) -> IO Map
insertIntoMap m@(Map ks vs s f) (k, v) = do
ix <- storeKey ks s f k
writeArray vs ix v
return m
fromList :: Int -> Int -> [(Number, Number)] -> IO Map
fromList s f xs = do
m <- newMap s f
foldM insertIntoMap m xs
(!) :: Map -> Number -> IO Number
(!) (Map ks vs s f) k = do
ix <- loadKey ks s f k
readArray vs ix
mupdate :: Map -> Number -> (Number -> Number) -> IO ()
mupdate (Map ks vs s fac) i f = do
ix <- loadKey ks s fac i
old <- readArray vs ix
let x' = f old
x' `seq` writeArray vs ix x'
r' :: Number -> Number
r' = floor . sqrt . fromIntegral
vs' :: Integral a => a -> a -> [a]
vs' n r = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
vss' n r = r + n `div` r -1
list' :: Int -> Int -> [Number] -> IO Map
list' s f vs = fromList s f [(i, i * (i + 1) `div` 2 - 1) | i <- vs]
problem10 :: Number -> IO Number
problem10 n = do
m <- list' (19*vss) (19*vss+7) vs
nm <- sieve m 2 r vs
nm ! n
where vs = vs' n r
vss = vss' n r
r = r' n
sieve :: Map -> Number -> Number -> [Number] -> IO Map
sieve m p r vs | p > r = return m
| otherwise = do
v1 <- m ! p
v2 <- m ! (p - 1)
nm <- if v1 > v2 then update m vs p else return m
sieve nm (p + 1) r vs
update :: Map -> [Number] -> Number -> IO Map
update m vs p = foldM (decrease p) m $ takeWhile (>= p*p) vs
decrease :: Number -> Map -> Number -> IO Map
decrease p m k = do
v <- sumOfSieved m k p
mupdate m k (subtract v)
return m
sumOfSieved :: Map -> Number -> Number -> IO Number
sumOfSieved m v p = do
v1 <- m ! (v `div` p)
v2 <- m ! (p - 1)
return $ p * (v1 - v2)
main = do { n <- problem10 (2*10^9) ; print n; } -- 2*10^9
I am not a professional with hashing and that sort of stuff, so this can certainly be improved a lot. Maybe we Haskellers should improve the of the shelf hash maps or provide some simpler ones.
If I plug my hashmap in Shershs (see answer below) code, see here we are even down to
time stack exec prim-hash2
95673602693282040
real 0m0.601s
user 0m0.604s
sys 0m0.034s
If you read through the source
for the function insert
in Data.HashTable.ST.Basic
, you
will see that it deletes the old key value pair and inserts
a new one. It doesn't look up the "place" for the value and
mutate it, as one might imagine, if one reads that it is
a "mutable" hashtable. Here the hashtable itself is mutable,
so you don't need to copy the whole hashtable for insertion
of a new key value pair, but the value places for the pairs
are not. I don't know if that is the whole story of slow.hs
being slow, but my guess is, it is a pretty big part of it.
So that's the idea I followed while trying to improve your program the first time.
See, you don't need a mutable mapping from keys to values. Your key set is fixed. You want a mapping from keys to mutable places. (Which is, by the way, what you get from C++ by default.)
And so I tried to come up with that. I used IntMap IORef
from
Data.IntMap.Strict
and Data.IORef
first and got a timing
of
tack exec prim
95673602693282040
real 0m2.134s
user 0m2.141s
sys 0m0.028s
I thought maybe it would help to work with unboxed values
and to get that, I used IOUArray Int Int
with 1 element
each instead of IORef
and got those timings:
time stack exec prim
95673602693282040
real 0m2.015s
user 0m2.018s
sys 0m0.038s
Not much of a difference and so I tried to get rid of bounds
checking in the 1 element arrays by using unsafeRead
and
unsafeWrite
and got a timing of
time stack exec prim
95673602693282040
real 0m1.845s
user 0m1.850s
sys 0m0.030s
which was the best I got using Data.IntMap.Strict
.
Of course I ran each program multiple times to see if the times are stable and the differences in run time aren't just noise.
It looks like these are all just micro-optimizations.
And here is the program that ran fastest for me without using a hand rolled data structure:
import qualified Data.IntMap.Strict as M
import Control.Monad
import Data.Array.IO
import Data.Array.Base (unsafeRead, unsafeWrite)
type Number = Int
type Place = IOUArray Number Number
type Map = M.IntMap Place
tupleToRef :: (Number, Number) -> IO (Number, Place)
tupleToRef = traverse (newArray (0,0))
insertRefs :: [(Number, Number)] -> IO [(Number, Place)]
insertRefs = traverse tupleToRef
fromList :: [(Number, Number)] -> IO Map
fromList xs = M.fromList <$> insertRefs xs
(!) :: Map -> Number -> IO Number
(!) m i = unsafeRead (m M.! i) 0
mupdate :: Map -> Number -> (Number -> Number) -> IO ()
mupdate m i f = do
let place = m M.! i
old <- unsafeRead place 0
let x' = f old
-- make the application of f strict
x' `seq` unsafeWrite place 0 x'
r' :: Number -> Number
r' = floor . sqrt . fromIntegral
vs' :: Integral a => a -> a -> [a]
vs' n r = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
list' :: [Number] -> IO Map
list' vs = fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]
problem10 :: Number -> IO Number
problem10 n = do
m <- list' vs
nm <- sieve m 2 r vs
nm ! n
where vs = vs' n r
r = r' n
sieve :: Map -> Number -> Number -> [Number] -> IO Map
sieve m p r vs | p > r = return m
| otherwise = do
v1 <- m ! p
v2 <- m ! (p - 1)
nm <- if v1 > v2 then update m vs p else return m
sieve nm (p + 1) r vs
update :: Map -> [Number] -> Number -> IO Map
update m vs p = foldM (decrease p) m $ takeWhile (>= p*p) vs
decrease :: Number -> Map -> Number -> IO Map
decrease p m k = do
v <- sumOfSieved m k p
mupdate m k (subtract v)
return m
sumOfSieved :: Map -> Number -> Number -> IO Number
sumOfSieved m v p = do
v1 <- m ! (v `div` p)
v2 <- m ! (p - 1)
return $ p * (v1 - v2)
main = do { n <- problem10 (2*10^9) ; print n; } -- 2*10^9
If you profile that, you see that it spends most of the time in the custom lookup function (!)
,
don't know how to improve that further. Trying to inline (!)
with {-# INLINE (!) #-}
didn't yield better results; maybe ghc already did this.
Try this and let me know how fast it is:
-- sum of primes
import Control.Monad (forM_, when)
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed
sieve :: Int -> UArray Int Bool
sieve n = runSTUArray $ do
let m = (n-1) `div` 2
r = floor . sqrt $ fromIntegral n
bits <- newArray (0, m-1) True
forM_ [0 .. r `div` 2 - 1] $ \i -> do
isPrime <- readArray bits i
when isPrime $ do
let a = 2*i*i + 6*i + 3
b = 2*i*i + 8*i + 6
forM_ [a, b .. (m-1)] $ \j -> do
writeArray bits j False
return bits
primes :: Int -> [Int]
primes n = 2 : [2*i+3 | (i, True) <- assocs $ sieve n]
main = do
print $ sum $ primes 1000000
You can run it on ideone. My algorithm is the Sieve of Eratosthenes, and it should be quite fast for small n. For n = 2,000,000,000, the array size may be a problem, in which case you will need to use a segmented sieve. See my blog for more information about the Sieve of Eratosthenes. See this answer for information about a segmented sieve (but not in Haskell, unfortunately).