Can anybody explain GHC's definition of IO?

前端 未结 3 1380
悲哀的现实
悲哀的现实 2021-01-13 10:23

The title is pretty self-descriptive, but there\'s one part that caught my attention:

newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
         


        
3条回答
  •  被撕碎了的回忆
    2021-01-13 11:24

    So in practice, an IO x is just some program (i.e. a schedule of CPU instructions, interrupts, whatever) which, when it's done executing, hands us a Haskell data structure of type x. The way that Haskell I/O works is by saying, "we will (functionally) describe how to construct the program which does stuff, and then GHC will do its thing, you'll get that program, and then it's up to you to actually run it." The resulting program basically looks like an interleaving:

    [IO stuff] -> [Haskell code] -> [IO stuff] -> ...
    

    and it's written functionally as a composition of a bunch of purely functional [Haskell code] -> [IO stuff] blocks.

    Now, how can we model this with a real type class? One clever way is to accumulate all of the commands that you can send to the underlying OS as Request data structures, and the responses that the OS can send back as Response data structures. You can then model those blocks as functions between a list of requests and a list of responses. Here's a simple version of that model, heavily exploiting laziness:

    type IO x = [Response] -> ([Request], x)
    

    The OS now provides this function with a lazy list -- don't call the head of it just yet, you have to first cons something onto the outgoing requests! -- and you produce this pair of a lazy list of requests and a lazy result. The OS reads your first request, does it, and provides the result as the first element of the Response. In this way you sort of get a fixed point operator. Now we see what return and bind look like:

     -- return needs to yield a special symbol of type Request which stops the 
     -- process of querying the OS.
     return x = ([Done], x) 
    
     -- bind needs to split the responses between those fed to mx and the rest,
     -- assume that every request yields exactly one response  so we can examine
     -- just the length of x_requests.
     bind :: ([Response] -> ([Request], x)) -> 
             (x -> [Response] -> ([Request], y)) -> 
             [Response] -> ([Request], y)
     bind mx x_to_my responses = (init x_requests ++ y_requests, y)
         where (x_requests, x) = mx responses
               (y_requests, y) = x_to_my x $ drop (length x_requests - 1) responses
    

    This should be correct but it's a little confusing. A little less confusing is to imagine a state monad with "the real world" inside, but unfortunately that is incorrect:

    newtype IO x = RawIO (runIO :: RealWorld -> (RealWorld, x))
    

    What's wrong with this? Basically it's the fact that the original RealWorld persists. We might for example write:

    RawIO $ \world -> let (world1, x) = runIO (putStrLn "Name?" >> getLine) world
                          (world2, y) = runIO (putStrLn "Age?" >> getLine) world
                      in (world1, y)
    

    What does this do? It performs the computation in a branching universe: in world #1 it asks one question (Name?) and in world #2 it asks a different question (Age?). It then throws world #2 away, but keeps the answer that it got there.

    So we are living in world #1, it asks us our name, and then magically it knows our age. The side effect from world #2 (asking us our age) cannot happen due to referential transparency), but the result of it has been acquired. Whoops -- real I/O can't do that.

    Well, that's OK as long as we hide the RawIO constructor! We'll just make all of our functions well-behaved and be done with it. We can then write completely sane versions of bind and return:

    return x = RawIO $ \world -> (world, x)
    bind mx x_to_my = RawIO $ \world -> let (world', x) = runIO mx world in 
        runIO (x_to_my x) world'
    

    So when we introduce side-effectful functions into the language, we can just write them a wrapper which ignores the "world" argument and performs the side-effect when the function is run. We then have:

    unsafePerformIO mx = let (_, x) = runIO mx (error "RealWorld doesn't exist) in x
    

    which can perform these I/O operations when GHC/GHCi actually needs them to happen.

提交回复
热议问题