I am learning Haskell and trying to understand Monads. I have two questions:
From what I understand, Monad is just another typeclass that declares ways to i
Let me start by pointing at the excellent "You could have invented monads" article. It illustrates how the Monad structure can naturally manifest while you are writing programs. But the tutorial doesn't mention IO
, so I will have a stab here at extending the approach.
Let us start with what you probably have already seen - the container monad. Let's say we have:
f, g :: Int -> [Int]
One way of looking at this is that it gives us a number of possible outputs for every possible input. What if we want all possible outputs for the composition of both functions? Giving all possibilities we could get by applying the functions one after the other?
Well, there's a function for that:
fg x = concatMap g $ f x
If we put this more general, we get
fg x = f x >>= g
xs >>= f = concatMap f xs
return x = [x]
Why would we want to wrap it like this? Well, writing our programs primarily using >>=
and return
gives us some nice properties - for example, we can be sure that it's relatively hard to "forget" solutions. We'd explicitly have to reintroduce it, say by adding another function skip
. And also we now have a monad and can use all combinators from the monad library!
Now, let us jump to your trickier example. Let's say the two functions are "side-effecting". That's not non-deterministic, it just means that in theory the whole world is both their input (as it can influence them) as well as their output (as the function can influence it). So we get something like:
f, g :: Int -> RealWorld# -> (Int, RealWorld#)
If we now want f
to get the world that g
left behind, we'd write:
fg x rw = let (y, rw') = f x rw
(r, rw'') = g y rw'
in (r, rw'')
Or generalized:
fg x = f x >>= g
x >>= f = \rw -> let (y, rw') = x rw
(r, rw'') = f y rw'
in (r, rw'')
return x = \rw -> (x, rw)
Now if the user can only use >>=
, return
and a few pre-defined IO
values we get a nice property again: The user will never actually see the RealWorld#
getting passed around! And that is a very good thing, as you aren't really interested in the details of where getLine
gets its data from. And again we get all the nice high-level functions from the monad libraries.
So the important things to take away:
The monad captures common patterns in your code, like "always pass all elements of container A to container B" or "pass this real-world-tag through". Often, once you realize that there is a monad in your program, complicated things become simply applications of the right monad combinator.
The monad allows you to completely hide the implementation from the user. It is an excellent encapsulation mechanism, be it for your own internal state or for how IO
manages to squeeze non-purity into a pure program in a relatively safe way.
Appendix
In case someone is still scratching his head over RealWorld#
as much as I did when I started: There's obviously more magic going on after all the monad abstraction has been removed. Then the compiler will make use of the fact that there can only ever be one "real world". That's good news and bad news:
It follows that the compiler must guarantuee execution ordering between functions (which is what we were after!)
But it also means that actually passing the real world isn't necessary as there is only one we could possibly mean: The one that is current when the function gets executed!
Bottom line is that once execution order is fixed, RealWorld#
simply gets optimized out. Therefore programs using the IO
monad actually have zero runtime overhead. Also note that using RealWorld#
is obviously only one possible way to put IO
- but it happens to be the one GHC uses internally. The good thing about monads is that, again, the user really doesn't need to know.