Why does this Haskell code run slower with -O?

后端 未结 1 1000
心在旅途
心在旅途 2020-12-24 04:18

This piece of Haskell code runs much slower with -O, but -O should be non-dangerous. Can anyone tell me what happened? If it matters, it i

相关标签:
1条回答
  • 2020-12-24 05:10

    I guess it is time this question gets a proper answer.

    What happened to your code with -O

    Let me zoom in your main function, and rewrite it slightly:

    main :: IO ()
    main = do
        [n, m] <- fmap (map read . words) getLine
        line <- getLine
        let nodes = listArray (0, n) . tonodes n . map (subtract 1) . map read . words $ line
        replicateM_ m $ query n nodes
    

    Clearly, the intention here is that the NodeArray is created once, and then used in every of the m invocations of query.

    Unfortunately, GHC transforms this code to, effectively,

    main = do
        [n, m] <- fmap (map read . words) getLine
        line <- getLine
        replicateM_ m $ do
            let nodes = listArray (0, n) . tonodes n . map (subtract 1) . map read . words $ line
            query n nodes
    

    and you can immediately see the problem here.

    What is the state hack, and why does it destroy my programs performance

    The reason is the state hack, which says (roughly): “When something is of type IO a, assume it is called only once.”. The official documentation is not much more elaborate:

    -fno-state-hack

    Turn off the "state hack" whereby any lambda with a State# token as argument is considered to be single-entry, hence it is considered OK to inline things inside it. This can improve performance of IO and ST monad code, but it runs the risk of reducing sharing.

    Roughly, the idea is as follows: If you define a function with an IO type and a where clause, e.g.

    foo x = do
        putStrLn y
        putStrLn y
      where y = ...x...
    

    Something of type IO a can be viewed as something of type RealWord -> (a, RealWorld). In that view, the above becomes (roughly)

    foo x = 
       let y = ...x... in 
       \world1 ->
         let (world2, ()) = putStrLn y world1
         let (world3, ()) = putStrLn y world2
         in  (world3, ())
    

    A call to foo would (typically) look like this foo argument world. But the definition of foo only takes one argument, and the other one is only consumed later by a local lambda expression! That is going to be a very slow call to foo. It would be much faster if the code would look like this:

    foo x world1 = 
       let y = ...x... in 
       let (world2, ()) = putStrLn y world1
       let (world3, ()) = putStrLn y world2
       in  (world3, ())
    

    This is called eta-expansion and done on various grounds (e.g. by analyzing the function’s definition, by checking how it is being called, and – in this case – type directed heuristics).

    Unfortunately, this degrades performance if the call to foo is actually of the form let fooArgument = foo argument, i.e. with an argument, but no world passed (yet). In the original code, if fooArgument is then used several times, y will still be calculated only once, and shared. In the modified code, y will be re-calculated every time – precisely what has happened to your nodes.

    Can things be fixed?

    Possibly. See #9388 for an attempt at doing so. The problem with fixing it is that it will cost performance in a lot of cases where the transformation happens to ok, even though the compiler cannot possibly know that for sure. And there are probably cases where it is technically not ok, i.e. sharing is lost, but it is still beneficial because the speedups from the faster calling outweigh the extra cost of the recalculation. So it is not clear where to go from here.

    0 讨论(0)
提交回复
热议问题