Ackermann very inefficient with Haskell/GHC

后端 未结 7 885
小鲜肉
小鲜肉 2020-12-23 13:58

I try computing Ackermann(4,1) and there\'s a big difference in performance between different languages/compilers. Below are results on my

相关标签:
7条回答
  • 2020-12-23 14:01

    Writing the algorithm in Haskell in a way that looks similar to the way you wrote it in C is not the same algorithm, because the semantics of recursion are quite different.

    Here is a version using the same mathematical algorithm, but where we represent calls to the Ackermann function symbolically using a data type. That way, we can control the semantics of the recursion more precisely.

    When compiled with optimization, this version runs in constant memory, but it is slow - about 4.5 minutes in an environment similar to yours. But I'm sure it could be modified to be much faster. This is just to give the idea.

    data Ack = Ack !Int
    
    ack :: Int -> Int -> Int
    ack m n = length . ackR $ Ack m : replicate n (Ack 0)
      where
        ackR n@(Ack 0 : _) = n
        ackR n             = ackR $ ack' n
    
        ack' [] = []
        ack' (Ack 0 : n) = Ack 0 : ack' n
        ack' [Ack m]     = [Ack (m-1), Ack 0]
        ack' (Ack m : n) = Ack (m-1) : ack' (Ack m : decr n)
    
        decr (Ack 0 : n) = n
        decr n           = decr $ ack' n
    
    0 讨论(0)
  • 2020-12-23 14:05

    This performance issue (except for GHC RTS bug obviously) seems to be fixed now on OS X 10.8 after Apple XCode update to 4.6.2. I can still reproduce it on Linux (I have been testing with GHC LLVM backend though), but not any more on OS X. After I updated the XCode to 4.6.2, the new version seems to have affected the GHC backend code generation for Ackermann substantially (from what I remember from looking at object dumps pre-update). I was able to reproduce the performance issue on Mac before XCode update - I don't have the numbers but they were surely quite bad. So, it seems that XCode update improved the GHC code generation for Ackermann.

    Now, both C and GHC versions are quite close. C code:

    int ack(int m,int n){
    
      if(m==0) return n+1;
      if(n==0) return ack(m-1,1);
      return ack(m-1, ack(m,n-1));
    
    }
    

    Time to execute ack(4,1):

    GCC 4.8.0: 2.94s
    Clang 4.1: 4s
    

    Haskell code:

    ack :: Int -> Int -> Int
    ack 0 n = n+1
    ack m 0 = ack (m-1) 1
    ack m n = ack (m-1) (ack m (n-1))
    

    Time to execute ack 4 1 (with +RTS -kc1M):

    GHC 7.6.1 Native: 3.191s
    GHC 7.6.1 LLVM: 3.8s 
    

    All were compiled with -O2 flag (and -rtsopts flag for GHC for RTS bug workaround). It is quite a head scratcher though. Updating XCode seems to have made a big difference with optimization of Ackermann in GHC.

    0 讨论(0)
  • 2020-12-23 14:10

    The following is an idiomatic version that takes advantage of Haskell's lazyness and GHC's optimisation of constant top-level expressions.

    acks :: [[Int]]
    acks = [ [ case (m, n) of
                    (0, _) -> n + 1
                    (_, 0) -> acks !! (m - 1) !! 1
                    (_, _) -> acks !! (m - 1) !! (acks !! m !! (n - 1))
             | n <- [0..] ]
           | m <- [0..] ]
    
    main :: IO ()
    main = print $ acks !! 4 !! 1
    

    Here, we're lazily building a matrix for all the values of the Ackermann function. As a result, subsequent calls to acks will not recompute anything (i.e. evaluating acks !! 4 !! 1 again will not double the running time).

    Although this is not the fastest solution, it looks a lot like the naïve implementation, it is very efficient in terms of memory use, and it recasts one of Haskell's weirder features (lazyness) as a strength.

    0 讨论(0)
  • 2020-12-23 14:11

    This version uses some properties of the ackermann function. It's not equivalent to the other versions, but it's fast :

    ackermann :: Int -> Int -> Int
    ackermann 0 n = n + 1
    ackermann m 0 = ackermann (m - 1) 1
    ackermann 1 n = n + 2
    ackermann 2 n = 2 * n + 3
    ackermann 3 n = 2 ^ (n + 3) - 3
    ackermann m n = ackermann (m - 1) (ackermann m (n - 1))
    

    Edit : And this is a version with memoization , we see that it's easy to memoize a function in haskell, the only change is in the call site :

    import Data.Function.Memoize
    
    ackermann :: Integer -> Integer -> Integer
    ackermann 0 n = n + 1
    ackermann m 0 = ackermann (m - 1) 1
    ackermann 1 n = n + 2
    ackermann 2 n = 2 * n + 3
    ackermann 3 n = 2 ^ (n + 3) - 3
    ackermann m n = ackermann (m - 1) (ackermann m (n - 1))
    
    main :: IO ()
    main = print $ memoize2 ackermann 4 2
    
    0 讨论(0)
  • 2020-12-23 14:14

    I don't see that this is a bug at all, ghc just isn't taking advantage of the fact that it knows that 4 and 1 are the only arguments the function will ever be called with -- that is, to put it bluntly, it doesn't cheat. It also doesn't do constant math for you, so if you had written main = print $ ack (2+2) 1, it wouldn't have calculated that 2+2 = 4 till runtime. The ghc has much more important things to think about. Help for the latter difficulty is available if you care for it http://hackage.haskell.org/package/const-math-ghc-plugin.

    So ghc is helped if you do a little math e.g. this is at least a hundered times as fast as your C program with 4 and 1 as arguments. But try it with 4 & 2:

    main = print $ ack 4 2 where
    
        ack :: Int -> Integer -> Integer
        ack 0 n = n + 1
        ack 1 n = n + 2 
        ack 2 n = 2 * n + 3
        ack m 0 = ack (m-1) 1
        ack m n = ack (m-1) (ack m (n-1) )
    

    This will give the right answer, all ~20,000 digits, in under a tenth of a second, whereas the gcc, with your algorithm, will take forever to give the wrong answer.

    0 讨论(0)
  • 2020-12-23 14:19

    It seems that there is some kind of bug involved. What GHC version are you using?

    With GHC 7, I get the same behavior as you do. The program consumes all available memory without producing any output.

    However if I compile it with GHC 6.12.1 just with ghc --make -O2 Ack.hs, it works perfectly. It computes the result in 10.8s on my computer, while plain C version takes 7.8s.

    I suggest you to report this bug on GHC web site.

    0 讨论(0)
提交回复
热议问题