Project Euler 14: performance compared to C and memoization

后端 未结 3 571
情歌与酒
情歌与酒 2021-01-11 14:34

I\'m currently working on project euler problem 14.

I solved it using a poorly coded program, without memoization, that took 386 5 seconds to run (s

相关标签:
3条回答
  • 2021-01-11 14:41

    Well, the C program uses unsigned long, but Integer can store arbitrarily large integers (it's a bignum). If you import Data.Word, then you can use Word, which is a machine-word-sized unsigned integer.

    After replacing Integer with Word, and using ghc -O2 and gcc -O3, the C program runs in 0.72 seconds, while the Haskell programs runs in 1.92 seconds. 2.6x isn't bad. However, ghc -O2 doesn't always help, and this is one of the programs on which it doesn't! Using just -O, as you did, brings the runtime down to 1.90 seconds.

    I tried replacing div with quot (which uses the same type of division as C; they only differ on negative inputs), but strangely it actually made the Haskell program run slightly slower for me.

    You should be able to speed up the syr function with the help of this previous Stack Overflow question I answered about the same Project Euler problem.

    0 讨论(0)
  • 2021-01-11 14:47

    After having compiled it with optimisations, there are still several differences to the C programme

    • you use div, while the C programme uses machine division (which truncates) [but any self-respecting C compiler transforms that into a shift, so that makes it yet faster], that would be quot in Haskell; that reduced the run time by some 15% here.
    • the C programme uses fixed-width 64-bit (or even 32-bit, but then it's just luck that it gets the correct answer, since some intermediate values exceed 32-bit range) integers, the Haskell programme uses arbitrary precision Integers. If you have 64-bit Ints in your GHC (64-bit OS other than Windows), replace Integer with Int. That reduced the run time by a factor of about 3 here. If you're on a 32-bit system, you're out of luck, GHC doesn't use native 64-bit instructions there, these operations are implemented as C calls, that's still rather slow.

    For the memoisation, you can outsource it to one of the memoisation packages on hackage, the only one that I remember is data-memocombinators, but there are others. Or you can do it yourself, for example keeping a map of previously calculated values - that would work best in the State monad,

    import Control.Monad.State.Strict
    import qualified Data.Map as Map
    import Data.Map (Map, singleton)
    
    type Memo = Map Integer Int
    
    syr :: Integer -> State Memo Int
    syr n = do
        mb <- gets (Map.lookup n)
        case mb of
          Just l -> return l
          Nothing -> do
              let m = if even n then n `quot` 2 else 3*n+1
              l <- syr m
              let l' = l+1
              modify (Map.insert n l')
              return l'
    
    solve :: Integer -> Int -> Integer -> State Memo (Integer,Int)
    solve maxi len start
        | len > 1000000 = return (maxi,len)
        | otherwise = do
             l <- syr start
             if len < l
                 then solve start l (start+1)
                 else solve maxi len (start+1)
    
    p14 :: (Integer,Int)
    p14 = evalState (solve 0 0 500000) (singleton 1 1)
    

    but that will probably not gain too much (not even when you've added the necessary strictness). The trouble is that a lookup in a Map is not too cheap and an insertion is relatively expensive.

    Another method is to keep a mutable array for the lookup. The code becomes more complicated, since you have to choose a reasonable upper bound for the values to cache (should be not much larger than the bound for the starting values) and deal with the parts of the sequences falling outside the memoised range. But an array lookup and write are fast. If you have 64-bit Ints, the below code runs pretty fast, here it takes 0.03s for a limit of 1 million, and 0.33s for a limit of 10 million, the corresponding (as closely as I reasonably could) C code runs in 0.018 resp. 0.2s.

    module Main (main) where
    
    import System.Environment (getArgs)
    import Data.Array.ST
    import Data.Array.Base
    import Control.Monad.ST
    import Data.Bits
    import Data.Int
    
    main :: IO ()
    main = do
        args <- getArgs
        let bd = case args of
                   a:_ -> read a
                   _   -> 100000
        print $ collMax bd
    
    next :: Int -> Int
    next n
        | n .&. 1 == 0  = n `unsafeShiftR` 1
        | otherwise     = 3*n + 1
    
    collMax :: Int -> (Int,Int16)
    collMax upper = runST $ do
        arr <- newArray (0,upper) 0 :: ST s (STUArray s Int Int16)
        let go l m
                | upper < m = go (l+1) $ next m
                | otherwise = do
                    l' <- unsafeRead arr m
                    case l' of
                      0 -> do
                          l'' <- go 1 $ next m
                          unsafeWrite arr m (l'' + 1)
                          return (l+l'')
                      _ -> return (l+l'-1)
            collect mi ml i
                | upper < i = return (mi, ml)
                | otherwise = do
                    l <- go 1 i
                    if l > ml
                      then collect i l (i+1)
                      else collect mi ml (i+1)
        unsafeWrite arr 1 1
        collect 1 1 2
    
    0 讨论(0)
  • 2021-01-11 14:47

    On my current system (32-bit Core2Duo) your Haskell code, including all the optimizations given in the answers, takes 0.8s to compile and 1.2s to run.

    You could transfer the run-time to compile-time, and reduce the run-time to nearly zero.

    module Euler14 where
    
    import Data.Word
    import Language.Haskell.TH
    
    terms :: Word -> Word
    terms n = countTerms n 0
      where
        countTerms 1 acc             = acc + 1
        countTerms n acc | even n    = countTerms (n `div` 2) (acc + 1)
                         | otherwise = countTerms (3 * n + 1) (acc + 1)
    
    longestT :: Word -> Word -> (Word, Word) 
    longestT mi mx = find mi mx (0, 0)
      where
          find mi mx (ct,cn) | mi == mx  = if ct > terms mi then (ct,cn) else (terms mi, mi)
                             | otherwise = find (mi + 1) mx
                                           (if ct > terms mi then (ct,cn) else (terms mi, mi))
    
    longest :: Word -> Word -> ExpQ
    longest mi mx = return $ TupE [LitE (IntegerL (fromIntegral a)),
                                   LitE (IntegerL (fromIntegral b))]
      where
        (a,b) = longestT mi mx
    

    and

    {-# LANGUAGE TemplateHaskell #-}
    import Euler14
    
    main = print $(longest 500000 999999)
    

    On my system it takes 2.3s to compile this but the run-time goes down to 0.003s. Compile Time Function Execution (CTFE) is something you can't do in C/C++. The only other programming language that I know of that supports CTFE is the D programming language. And just to be complete, the C code takes 0.1s to compile and 0.7s to run.

    0 讨论(0)
提交回复
热议问题