When is unsafeInterleaveIO unsafe?

前端 未结 4 764
失恋的感觉
失恋的感觉 2021-01-31 05:43

Unlike other unsafe* operations, the documentation for unsafeInterleaveIO is not very clear about its possible pitfalls. So exactly when is it unsafe? I would like

相关标签:
4条回答
  • 2021-01-31 05:55

    At the top, the two functions you have are always identical.

    v1 = do !a <- x
            y
    
    v2 = do !a <- unsafeInterleaveIO x
            y
    

    Remember that unsafeInterleaveIO defers the IO operation until its result is forced -- yet you are forcing it immediately by using a strict pattern match !a, so the operation is not deferred at all. So v1 and v2 are exactly the same.

    In general

    In general, it is up to you to prove that your use of unsafeInterleaveIO is safe. If you call unsafeInterleaveIO x, then you have to prove that x can be called at any time and still produce the same output.

    Modern sentiment about Lazy IO

    ...is that Lazy IO is dangerous and a bad idea 99% of the time.

    The chief problem that it is trying to solve is that IO has to be done in the IO monad, but you want to be able to do incremental IO and you don't want to rewrite all of your pure functions to call IO callbacks to get more data. Incremental IO is important because it uses less memory, allowing you to operate on data sets that don't fit in memory without changing your algorithms too much.

    Lazy IO's solution is to do IO outside of the IO monad. This is not generally safe.

    Today, people are solving the problem of incremental IO in different ways by using libraries like Conduit or Pipes. Conduit and Pipes are much more deterministic and well-behaved than Lazy IO, solve the same problems, and do not require unsafe constructs.

    Remember that unsafeInterleaveIO is really just unsafePerformIO with a different type.

    Example

    Here is an example of a program that is broken due to lazy IO:

    rot13 :: Char -> Char
    rot13 x 
      | (x >= 'a' && x <= 'm') || (x >= 'A' && x <= 'M') = toEnum (fromEnum x + 13)
      | (x >= 'n' && x <= 'z') || (x >= 'N' && x <= 'Z') = toEnum (fromEnum x - 13)
      | otherwise = x 
    
    rot13file :: FilePath -> IO ()
    rot13file path = do
      x <- readFile path
      let y = map rot13 x
      writeFile path y
    
    main = rot13file "test.txt"
    

    This program will not work. Replacing the lazy IO with strict IO will make it work.

    Links

    From Lazy IO breaks purity by Oleg Kiselyov on the Haskell mailing list:

    We demonstrate how lazy IO breaks referential transparency. A pure function of the type Int->Int->Int gives different integers depending on the order of evaluation of its arguments. Our Haskell98 code uses nothing but the standard input. We conclude that extolling the purity of Haskell and advertising lazy IO is inconsistent.

    ...

    Lazy IO should not be considered good style. One of the common definitions of purity is that pure expressions should evaluate to the same results regardless of evaluation order, or that equals can be substituted for equals. If an expression of the type Int evaluates to 1, we should be able to replace every occurrence of the expression with 1 without changing the results and other observables.

    From Lazy vs correct IO by Oleg Kiselyov on the Haskell mailing list:

    After all, what could be more against the spirit of Haskell than a `pure' function with observable side effects. With Lazy IO, one indeed has to choose between correctness and performance. The appearance of such code is especially strange after the evidence of deadlocks with Lazy IO, presented on this list less than a month ago. Let alone unpredictable resource usage and reliance on finalizers to close files (forgetting that GHC does not guarantee that finalizers will be run at all).

    Kiselyov wrote the Iteratee library, which was the first real alternative to lazy IO.

    0 讨论(0)
  • 2021-01-31 05:58

    Laziness means that when (and whether) exactly a computation is actually carried out depends on when (and whether) the runtime implementation decides it needs the value. As a Haskell programmer you completely relinquish control over the evaluation order (except by the data dependencies inherent in your code, and when you start playing with strictness to force the runtime to make certain choices).

    That's great for pure computations, because the result of a pure computation will be exactly the same whenever you do it (except that if you carry out computations that you don't actually need, you might encounter errors or fail to terminate, when another evaluation order might allow the program to terminate successfully; but all non-bottom values computed by any evaluation order will be the same).

    But when you're writing IO-dependent code, evaluation order matters. The whole point of IO is to provide a mechanism for building computations whose steps depend on and affect the world outside the program, and an important part of doing that is that those steps are explicitly sequenced. Using unsafeInterleaveIO throws away that explicit sequencing, and relinquishes control of when (and whether) the IO operation is actually carried out to the runtime system.

    This is unsafe in general for IO operations, because there may be dependencies between their side-effects which cannot be inferred from the data dependencies inside the program. For example, one IO action might create a file with some data in it, and another IO action might read the same file. If they're both executed "lazily", then they'll only get run when the resulting Haskell value is needed. Creating the file is probably IO () though, and it's quite possible that the () is never needed. That could mean that the read operation is carried out first, either failing or reading data that was already in the file, but not the data that should have been put there by the other operation. There's no guarantee that the runtime system will execute them in the right order. To program correctly with a system that always did this for IO you'd have to be able to accurately predict the order in which the Haskell runtime will choose to perform the various IO actions.

    Treat unsafeInterlaveIO as promise to the compiler (which it cannot verify, it's just going to trust you) that it doesn't matter when the IO action is carried out, or whether it's elided entirely. This is really what all the unsafe* functions are; they provide facilities that are not safe in general, and for which safety cannot be automatically checked, but which can be safe in particular instances. The onus is on you to ensure that your use of them is in fact safe. But if you make a promise to the compiler, and your promise is false, then unpleasant bugs can be the result. The "unsafe" in the name is to scare you into thinking about your particular case and deciding whether you really can make the promise to the compiler.

    0 讨论(0)
  • 2021-01-31 06:01

    Basically everything under "Update" in the question is so confused it's not even wrong, so please try to forget it when you're trying to understand my answer.

    Look at this function:

    badLazyReadlines :: Handle -> IO [String]
    badLazyReadlines h = do
      l <- unsafeInterleaveIO $ hGetLine h
      r <- unsafeInterleaveIO $ badLazyReadlines h
      return (l:r)
    

    In addition to what I'm trying to illustrate: the above function also doesn't handle reaching the end of the file. But ignore that for now.

    main = do
      h <- openFile "example.txt" ReadMode
      lns <- badLazyReadlines h
      putStrLn $ lns ! 4
    

    This will print the first line of "example.txt", because the 5th element in the list is actually the first line that's read from the file.

    0 讨论(0)
  • 2021-01-31 06:03

    Your joinIO and joinIO' are not semantically equivalent. They will usually be the same, but there's a subtlety involved: a bang pattern makes a value strict, but that's all it does. Bang patterns are implemented using seq, and that does not enforce a particular evaluation order, in particular the following two are semantically equivalent:

    a `seq` b `seq` c
    b `seq` a `seq` c
    

    GHC can evaluate either b or a first before returning c. Indeed, it can evaluate c first, then a and b, then return c. Or, if it can statically prove a or b are non-bottom, or that c is bottom, it doesn't have to evaluate a or b at all. Some optimisations do genuinely make use of this fact, but it doesn't come up very often in practice.

    unsafeInterleaveIO, by contrast, is sensitive to all or any of those changes – it does not depend on the semantic property of how strict some function is, but the operational property of when something is evaluated. So all of the above transformations are visible to it, which is why it's only reasonable to view unsafeInterleaveIO as performing its IO non-deterministically, more or less whenever it feels appropriate.

    This is, in essence, why unsafeInterleaveIO is unsafe - it is the only mechanism in normal use that can detect transformations that ought to be meaning-preserving. It's the only way you can detect evaluation, which by rights ought to be impossible.

    As an aside, it's probably fair to mentally prepend unsafe to every function from GHC.Prim, and probably several other GHC. modules as well. They're certainly not ordinary Haskell.

    0 讨论(0)
提交回复
热议问题