Evaluation and space leaks in Haskell

半城伤御伤魂 提交于 2019-12-03 11:35:15

This is because Control.Monad.State re-exports Control.Monad.State.Lazy. If you imported, Control.Monad.State.Strict, both would overflow that way.

The reason it overflows with strict State or IO is that replicateM needs to run the action iterations times recursively, before it can build the list. To put it loosely, replicateM must "combine" the "effects" of all the actions it replicates into one giant "effect". The terms "combine" and "effect" are very vague, and can mean an infinite number of different things, but they're about the best we've got for talking about such abstract things. replicateM with a large value will end up overflowing the stack in nearly every choice of monad. It's the fact that it doesn't with lazy State that's bizarre.

To see why it doesn't overflow with lazy State, you need to look into the details of (>>=) for lazy State, and replicateM. The following definitions are greatly simplified, but they reflect the details necessary to illustrate how this works.

newtype State s a = State { runState :: s -> (a, s) }

instance Monad (State s) where
    return x = State $ \s -> (x, s)
    x >>= f = State $ \s -> let (a, s') = runState x s in runState (f a) s'

replicateM :: Monad m => Int -> m a -> m [a]
replicateM 0 _ = return []
replicateM n mx | n < 0 = error "don't do this"
                | otherwise =
                    mx >>= \x -> replicateM (n - 1) mx >>= \xs -> return (x:xs)

So first, look at replicateM. Take note that when n is greater than 0, it is a call to (>>=). So the behavior of replicateM depends closely on what (>>=) does.

When you look at (>>=), you see it produces a state transition function that binds the results of the state transition function x in a let binding, then returns the result of the transition function that's the result of f applied to arguments from that binding.

Ok, that statement was clear as mud, but it's really important. Let's just look inside the lambda for the moment. Looking at the result of the function (>>=) creates, you see let {something to do with x} in {something to do with f and the results of the let binding}. This is important with lazy evaluation. It means that just maybe it can ignore x, or maybe part of it, when it evaluates (>>=), if the particular function f allows it to. In the case of lazy State, it means that it might be able to delay calculating future state values, if f can produce a constructor before looking at the state.

This turns out to be what allows it to work. The particular way replicateM assembles calls to (>>=), it results in a function that produces (:) constructors before examining the state passed to them. This allows incremental processing of the list, if the final state is never examined. If you ever look at the final state, that destroys the ability to function incrementally, because the final state requires doing all the work to calculate it. But your use of evalState resulted in the final state being thrown away unexamined, so evaluation was free to proceed incrementally.

The culprit is hidden deep inside replicateM. Let's look at the source.

replicateM        :: (Monad m) => Int -> m a -> m [a]
replicateM n x    = sequence (replicate n x)

sequence       :: Monad m => [m a] -> m [a] 
sequence ms = foldr k (return []) ms where
  k m m' = do { x <- m; xs <- m'; return (x:xs) }

In particular, take a look at a single unrolling of the foldr in sequence

foldr k (return []) (replicate n roll')

do x  <- roll'
   xs <- foldr k (return []) (replicate n roll')
   return (x:xs)

In other words, unless we can lazily return (x : ... thunk ... ) early then we'll unroll the entire replication prior to returning the first value. The answer as to whether or not we can return that value has to do with the definition of (>>=) in our monad.

roll' >>= \x -> foldr k (return []) (replicate n roll') >>= \xs -> return (x:xs)

It's fair to say that since IO performs side-effects it's going to perform binds sequentially—we're definitely going to unroll the whole thing. State has two forms, the Control.Monad.Trans.State.Lazy version and the Control.Monad.Trans.State.Strict version where Control.Monad.Trans.State defaults to the Lazy version. There, (>>=) is defined as

m >>= k  = StateT $ \s -> do
    ~(a, s') <- runStateT m s
    runStateT (k a) s'

So we can see the explicit irrefutable bind happening which lets us proceed to return the result lazily.

It's worth taking a look at a recent review of this problem by Joachim Breitner. There's also a lot of work on this in the pipes and conduit ecosystems that might be worth examining.

Generally, it's worth being suspect of replicateM however due to that notion of sequencing we saw above: "build the head then build the tail then return the cons".

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!