Ackermann very inefficient with Haskell/GHC

I try computing Ackermann(4,1) and there\'s a big difference in performance between different languages/compilers. Below are results on my

    Writing the algorithm in Haskell in a way that looks similar to the way you wrote it in C is not the same algorithm, because the semantics of recursion are quite different.

    Here is a version using the same mathematical algorithm, but where we represent calls to the Ackermann function symbolically using a data type. That way, we can control the semantics of the recursion more precisely.

    When compiled with optimization, this version runs in constant memory, but it is slow - about 4.5 minutes in an environment similar to yours. But I'm sure it could be modified to be much faster. This is just to give the idea.

    data Ack = Ack !Int
    ack :: Int -> Int -> Int
    ack m n = length . ackR $ Ack m : replicate n (Ack 0)
        ackR n@(Ack 0 : _) = n
        ackR n             = ackR $ ack' n
        ack' [] = []
        ack' (Ack 0 : n) = Ack 0 : ack' n
        ack' [Ack m]     = [Ack (m-1), Ack 0]
        ack' (Ack m : n) = Ack (m-1) : ack' (Ack m : decr n)
        decr (Ack 0 : n) = n
        decr n           = decr $ ack' n
    This performance issue (except for GHC RTS bug obviously) seems to be fixed now on OS X 10.8 after Apple XCode update to 4.6.2. I can still reproduce it on Linux (I have been testing with GHC LLVM backend though), but not any more on OS X. After I updated the XCode to 4.6.2, the new version seems to have affected the GHC backend code generation for Ackermann substantially (from what I remember from looking at object dumps pre-update). I was able to reproduce the performance issue on Mac before XCode update - I don't have the numbers but they were surely quite bad. So, it seems that XCode update improved the GHC code generation for Ackermann.

    Now, both C and GHC versions are quite close. C code:

    int ack(int m,int n){
      if(m==0) return n+1;
      if(n==0) return ack(m-1,1);
      return ack(m-1, ack(m,n-1));

    Time to execute ack(4,1):

    GCC 4.8.0: 2.94s
    Clang 4.1: 4s

    Haskell code:

    ack :: Int -> Int -> Int
    ack 0 n = n+1
    ack m 0 = ack (m-1) 1
    ack m n = ack (m-1) (ack m (n-1))

    Time to execute ack 4 1 (with +RTS -kc1M):

    GHC 7.6.1 Native: 3.191s
    GHC 7.6.1 LLVM: 3.8s 

    All were compiled with -O2 flag (and -rtsopts flag for GHC for RTS bug workaround). It is quite a head scratcher though. Updating XCode seems to have made a big difference with optimization of Ackermann in GHC.

    The following is an idiomatic version that takes advantage of Haskell's lazyness and GHC's optimisation of constant top-level expressions.

    acks :: [[Int]]
    acks = [ [ case (m, n) of
                    (0, _) -> n + 1
                    (_, 0) -> acks !! (m - 1) !! 1
                    (_, _) -> acks !! (m - 1) !! (acks !! m !! (n - 1))
             | n <- [0..] ]
           | m <- [0..] ]
    main :: IO ()
    main = print $ acks !! 4 !! 1

    Here, we're lazily building a matrix for all the values of the Ackermann function. As a result, subsequent calls to acks will not recompute anything (i.e. evaluating acks !! 4 !! 1 again will not double the running time).

    Although this is not the fastest solution, it looks a lot like the naïve implementation, it is very efficient in terms of memory use, and it recasts one of Haskell's weirder features (lazyness) as a strength.

    This version uses some properties of the ackermann function. It's not equivalent to the other versions, but it's fast :

    ackermann :: Int -> Int -> Int
    ackermann 0 n = n + 1
    ackermann m 0 = ackermann (m - 1) 1
    ackermann 1 n = n + 2
    ackermann 2 n = 2 * n + 3
    ackermann 3 n = 2 ^ (n + 3) - 3
    ackermann m n = ackermann (m - 1) (ackermann m (n - 1))

    Edit : And this is a version with memoization , we see that it's easy to memoize a function in haskell, the only change is in the call site :

    import Data.Function.Memoize
    ackermann :: Integer -> Integer -> Integer
    ackermann 0 n = n + 1
    ackermann m 0 = ackermann (m - 1) 1
    ackermann 1 n = n + 2
    ackermann 2 n = 2 * n + 3
    ackermann 3 n = 2 ^ (n + 3) - 3
    ackermann m n = ackermann (m - 1) (ackermann m (n - 1))
    main :: IO ()
    main = print $ memoize2 ackermann 4 2
    I don't see that this is a bug at all, ghc just isn't taking advantage of the fact that it knows that 4 and 1 are the only arguments the function will ever be called with -- that is, to put it bluntly, it doesn't cheat. It also doesn't do constant math for you, so if you had written main = print $ ack (2+2) 1, it wouldn't have calculated that 2+2 = 4 till runtime. The ghc has much more important things to think about. Help for the latter difficulty is available if you care for it

    So ghc is helped if you do a little math e.g. this is at least a hundered times as fast as your C program with 4 and 1 as arguments. But try it with 4 & 2:

    main = print $ ack 4 2 where
        ack :: Int -> Integer -> Integer
        ack 0 n = n + 1
        ack 1 n = n + 2 
        ack 2 n = 2 * n + 3
        ack m 0 = ack (m-1) 1
        ack m n = ack (m-1) (ack m (n-1) )

    This will give the right answer, all ~20,000 digits, in under a tenth of a second, whereas the gcc, with your algorithm, will take forever to give the wrong answer.

    It seems that there is some kind of bug involved. What GHC version are you using?

    With GHC 7, I get the same behavior as you do. The program consumes all available memory without producing any output.

    However if I compile it with GHC 6.12.1 just with ghc --make -O2 Ack.hs, it works perfectly. It computes the result in 10.8s on my computer, while plain C version takes 7.8s.

    I suggest you to report this bug on GHC web site.

    0 讨论(0)