After few hours of debugging, I realized that a very simple toy example was not efficient due to a missing !
in an expression return $ 1 + x
(thanks du
You ask
return $ 1 + x
[...] but how come ghc does not optimize that??
The answer is that strict evaluation and lazy evaluation have subtly different semantics, so having GHC optimise it might break your program.
The difference lies in the treatment of undefined values. Any attempt to evaluate an undefined
throws an exception. In GHCi:
Prelude> undefined
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
undefined, called at :1:1 in interactive:Ghci1
If I have an expression that contains an undefined then the same thing happens:
Prelude> 2 + undefined
*** Exception: Prelude.undefined [...]
However if the evaluation never gets to the undefined then everything is fine:
Prelude> True || undefined
True
Haskell uses "non-strict semantics" and "lazy evaluation". Technically the non-strict semantics are part of the definition of Haskell and lazy evaluation is the implementation mechanism in GHC, but you can think of them as synonyms. When you define a variable the value is not computed immediately, so if you never use the variable then you have no problem:
Prelude> let b = undefined
Prelude> b
*** Exception: Prelude.undefined
The let
works fine, but evaluating the variable it defines throws an exception.
Now consider your towering stack of unevaluated 1+
calls. GHC has no way of knowing in advance whether you are ever going to use the result (see below), and it also has no way of knowing whether or not there is an exception lurking in there somewhere. As a programmer you might know that there is an exception and carefully not look at the result, relying on the non-strict semantics of Haskell. If GHC prematurely evaluates and gets an exception your program will fail when it should not have.
Actually the GHC compiler includes a piece of optimisation called the Demand Analyser (it used to be called the Strictness Analyser) which looks for opportunities to optimise in exactly the way you want. However it has limits because it can only optimise computations when it can prove that the result is going to be evaluated.
Another wrinkle here is that you have used the State monad. This actually comes in two variants; Lazy and Strict. The Strict variant forces the state when it gets written, but the Lazy variant (the default) doesn't.