Project Euler 14: performance compared to C and memoization

后端未结

关注

 3  571

情歌与酒

I\'m currently working on project euler problem 14.

I solved it using a poorly coded program, without memoization, that took ~~386~~ 5 seconds to run (s

相关标签:

3条回答

-上瘾入骨i

2021-01-11 14:41

Well, the C program uses unsigned long, but Integer can store arbitrarily large integers (it's a bignum). If you import Data.Word, then you can use Word, which is a machine-word-sized unsigned integer.

After replacing Integer with Word, and using ghc -O2 and gcc -O3, the C program runs in 0.72 seconds, while the Haskell programs runs in 1.92 seconds. 2.6x isn't bad. However, ghc -O2 doesn't always help, and this is one of the programs on which it doesn't! Using just -O, as you did, brings the runtime down to 1.90 seconds.

I tried replacing div with quot (which uses the same type of division as C; they only differ on negative inputs), but strangely it actually made the Haskell program run slightly slower for me.

You should be able to speed up the syr function with the help of this previous Stack Overflow question I answered about the same Project Euler problem.

0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-01-11 14:47
After having compiled it with optimisations, there are still several differences to the C programme
- you use div, while the C programme uses machine division (which truncates) [but any self-respecting C compiler transforms that into a shift, so that makes it yet faster], that would be quot in Haskell; that reduced the run time by some 15% here.
- the C programme uses fixed-width 64-bit (or even 32-bit, but then it's just luck that it gets the correct answer, since some intermediate values exceed 32-bit range) integers, the Haskell programme uses arbitrary precision Integers. If you have 64-bit Ints in your GHC (64-bit OS other than Windows), replace Integer with Int. That reduced the run time by a factor of about 3 here. If you're on a 32-bit system, you're out of luck, GHC doesn't use native 64-bit instructions there, these operations are implemented as C calls, that's still rather slow.
For the memoisation, you can outsource it to one of the memoisation packages on hackage, the only one that I remember is data-memocombinators, but there are others. Or you can do it yourself, for example keeping a map of previously calculated values - that would work best in the State monad,
```
import Control.Monad.State.Strict
import qualified Data.Map as Map
import Data.Map (Map, singleton)

type Memo = Map Integer Int

syr :: Integer -> State Memo Int
syr n = do
    mb <- gets (Map.lookup n)
    case mb of
      Just l -> return l
      Nothing -> do
          let m = if even n then n `quot` 2 else 3*n+1
          l <- syr m
          let l' = l+1
          modify (Map.insert n l')
          return l'

solve :: Integer -> Int -> Integer -> State Memo (Integer,Int)
solve maxi len start
    | len > 1000000 = return (maxi,len)
    | otherwise = do
         l <- syr start
         if len < l
             then solve start l (start+1)
             else solve maxi len (start+1)

p14 :: (Integer,Int)
p14 = evalState (solve 0 0 500000) (singleton 1 1)
```
but that will probably not gain too much (not even when you've added the necessary strictness). The trouble is that a lookup in a Map is not too cheap and an insertion is relatively expensive.

Another method is to keep a mutable array for the lookup. The code becomes more complicated, since you have to choose a reasonable upper bound for the values to cache (should be not much larger than the bound for the starting values) and deal with the parts of the sequences falling outside the memoised range. But an array lookup and write are fast. If you have 64-bit Ints, the below code runs pretty fast, here it takes 0.03s for a limit of 1 million, and 0.33s for a limit of 10 million, the corresponding (as closely as I reasonably could) C code runs in 0.018 resp. 0.2s.
```
module Main (main) where

import System.Environment (getArgs)
import Data.Array.ST
import Data.Array.Base
import Control.Monad.ST
import Data.Bits
import Data.Int

main :: IO ()
main = do
    args <- getArgs
    let bd = case args of
               a:_ -> read a
               _   -> 100000
    print $ collMax bd

next :: Int -> Int
next n
    | n .&. 1 == 0  = n `unsafeShiftR` 1
    | otherwise     = 3*n + 1

collMax :: Int -> (Int,Int16)
collMax upper = runST $ do
    arr <- newArray (0,upper) 0 :: ST s (STUArray s Int Int16)
    let go l m
            | upper < m = go (l+1) $ next m
            | otherwise = do
                l' <- unsafeRead arr m
                case l' of
                  0 -> do
                      l'' <- go 1 $ next m
                      unsafeWrite arr m (l'' + 1)
                      return (l+l'')
                  _ -> return (l+l'-1)
        collect mi ml i
            | upper < i = return (mi, ml)
            | otherwise = do
                l <- go 1 i
                if l > ml
                  then collect i l (i+1)
                  else collect mi ml (i+1)
    unsafeWrite arr 1 1
    collect 1 1 2
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

孤城傲影

2021-01-11 14:47

On my current system (32-bit Core2Duo) your Haskell code, including all the optimizations given in the answers, takes 0.8s to compile and 1.2s to run.

You could transfer the run-time to compile-time, and reduce the run-time to nearly zero.

module Euler14 where

import Data.Word
import Language.Haskell.TH

terms :: Word -> Word
terms n = countTerms n 0
  where
    countTerms 1 acc             = acc + 1
    countTerms n acc | even n    = countTerms (n `div` 2) (acc + 1)
                     | otherwise = countTerms (3 * n + 1) (acc + 1)

longestT :: Word -> Word -> (Word, Word) 
longestT mi mx = find mi mx (0, 0)
  where
      find mi mx (ct,cn) | mi == mx  = if ct > terms mi then (ct,cn) else (terms mi, mi)
                         | otherwise = find (mi + 1) mx
                                       (if ct > terms mi then (ct,cn) else (terms mi, mi))

longest :: Word -> Word -> ExpQ
longest mi mx = return $ TupE [LitE (IntegerL (fromIntegral a)),
                               LitE (IntegerL (fromIntegral b))]
  where
    (a,b) = longestT mi mx

and

{-# LANGUAGE TemplateHaskell #-}
import Euler14

main = print $(longest 500000 999999)

On my system it takes 2.3s to compile this but the run-time goes down to 0.003s. Compile Time Function Execution (CTFE) is something you can't do in C/C++. The only other programming language that I know of that supports CTFE is the D programming language. And just to be complete, the C code takes 0.1s to compile and 0.7s to run.

0 讨论(0)