Unlike other unsafe* operations, the documentation for unsafeInterleaveIO
is not very clear about its possible pitfalls. So exactly when is it unsafe? I would like
At the top, the two functions you have are always identical.
v1 = do !a <- x
y
v2 = do !a <- unsafeInterleaveIO x
y
Remember that unsafeInterleaveIO
defers the IO
operation until its result is forced -- yet you are forcing it immediately by using a strict pattern match !a
, so the operation is not deferred at all. So v1
and v2
are exactly the same.
In general, it is up to you to prove that your use of unsafeInterleaveIO
is safe. If you call unsafeInterleaveIO x
, then you have to prove that x
can be called at any time and still produce the same output.
...is that Lazy IO is dangerous and a bad idea 99% of the time.
The chief problem that it is trying to solve is that IO has to be done in the IO
monad, but you want to be able to do incremental IO and you don't want to rewrite all of your pure functions to call IO callbacks to get more data. Incremental IO is important because it uses less memory, allowing you to operate on data sets that don't fit in memory without changing your algorithms too much.
Lazy IO's solution is to do IO outside of the IO
monad. This is not generally safe.
Today, people are solving the problem of incremental IO in different ways by using libraries like Conduit or Pipes. Conduit and Pipes are much more deterministic and well-behaved than Lazy IO, solve the same problems, and do not require unsafe constructs.
Remember that unsafeInterleaveIO
is really just unsafePerformIO
with a different type.
Here is an example of a program that is broken due to lazy IO:
rot13 :: Char -> Char
rot13 x
| (x >= 'a' && x <= 'm') || (x >= 'A' && x <= 'M') = toEnum (fromEnum x + 13)
| (x >= 'n' && x <= 'z') || (x >= 'N' && x <= 'Z') = toEnum (fromEnum x - 13)
| otherwise = x
rot13file :: FilePath -> IO ()
rot13file path = do
x <- readFile path
let y = map rot13 x
writeFile path y
main = rot13file "test.txt"
This program will not work. Replacing the lazy IO with strict IO will make it work.
From Lazy IO breaks purity by Oleg Kiselyov on the Haskell mailing list:
We demonstrate how lazy IO breaks referential transparency. A pure function of the type
Int->Int->Int
gives different integers depending on the order of evaluation of its arguments. Our Haskell98 code uses nothing but the standard input. We conclude that extolling the purity of Haskell and advertising lazy IO is inconsistent....
Lazy IO should not be considered good style. One of the common definitions of purity is that pure expressions should evaluate to the same results regardless of evaluation order, or that equals can be substituted for equals. If an expression of the type Int evaluates to 1, we should be able to replace every occurrence of the expression with 1 without changing the results and other observables.
From Lazy vs correct IO by Oleg Kiselyov on the Haskell mailing list:
After all, what could be more against the spirit of Haskell than a `pure' function with observable side effects. With Lazy IO, one indeed has to choose between correctness and performance. The appearance of such code is especially strange after the evidence of deadlocks with Lazy IO, presented on this list less than a month ago. Let alone unpredictable resource usage and reliance on finalizers to close files (forgetting that GHC does not guarantee that finalizers will be run at all).
Kiselyov wrote the Iteratee library, which was the first real alternative to lazy IO.
Laziness means that when (and whether) exactly a computation is actually carried out depends on when (and whether) the runtime implementation decides it needs the value. As a Haskell programmer you completely relinquish control over the evaluation order (except by the data dependencies inherent in your code, and when you start playing with strictness to force the runtime to make certain choices).
That's great for pure computations, because the result of a pure computation will be exactly the same whenever you do it (except that if you carry out computations that you don't actually need, you might encounter errors or fail to terminate, when another evaluation order might allow the program to terminate successfully; but all non-bottom values computed by any evaluation order will be the same).
But when you're writing IO-dependent code, evaluation order matters. The whole point of IO
is to provide a mechanism for building computations whose steps depend on and affect the world outside the program, and an important part of doing that is that those steps are explicitly sequenced. Using unsafeInterleaveIO
throws away that explicit sequencing, and relinquishes control of when (and whether) the IO
operation is actually carried out to the runtime system.
This is unsafe in general for IO operations, because there may be dependencies between their side-effects which cannot be inferred from the data dependencies inside the program. For example, one IO
action might create a file with some data in it, and another IO
action might read the same file. If they're both executed "lazily", then they'll only get run when the resulting Haskell value is needed. Creating the file is probably IO ()
though, and it's quite possible that the ()
is never needed. That could mean that the read operation is carried out first, either failing or reading data that was already in the file, but not the data that should have been put there by the other operation. There's no guarantee that the runtime system will execute them in the right order. To program correctly with a system that always did this for IO
you'd have to be able to accurately predict the order in which the Haskell runtime will choose to perform the various IO
actions.
Treat unsafeInterlaveIO
as promise to the compiler (which it cannot verify, it's just going to trust you) that it doesn't matter when the IO
action is carried out, or whether it's elided entirely. This is really what all the unsafe*
functions are; they provide facilities that are not safe in general, and for which safety cannot be automatically checked, but which can be safe in particular instances. The onus is on you to ensure that your use of them is in fact safe. But if you make a promise to the compiler, and your promise is false, then unpleasant bugs can be the result. The "unsafe" in the name is to scare you into thinking about your particular case and deciding whether you really can make the promise to the compiler.
Basically everything under "Update" in the question is so confused it's not even wrong, so please try to forget it when you're trying to understand my answer.
Look at this function:
badLazyReadlines :: Handle -> IO [String]
badLazyReadlines h = do
l <- unsafeInterleaveIO $ hGetLine h
r <- unsafeInterleaveIO $ badLazyReadlines h
return (l:r)
In addition to what I'm trying to illustrate: the above function also doesn't handle reaching the end of the file. But ignore that for now.
main = do
h <- openFile "example.txt" ReadMode
lns <- badLazyReadlines h
putStrLn $ lns ! 4
This will print the first line of "example.txt", because the 5th element in the list is actually the first line that's read from the file.
Your joinIO
and joinIO'
are not semantically equivalent. They will usually be the same, but there's a subtlety involved: a bang pattern makes a value strict, but that's all it does. Bang patterns are implemented using seq
, and that does not enforce a particular evaluation order, in particular the following two are semantically equivalent:
a `seq` b `seq` c
b `seq` a `seq` c
GHC can evaluate either b or a first before returning c. Indeed, it can evaluate c first, then a and b, then return c. Or, if it can statically prove a or b are non-bottom, or that c is bottom, it doesn't have to evaluate a or b at all. Some optimisations do genuinely make use of this fact, but it doesn't come up very often in practice.
unsafeInterleaveIO
, by contrast, is sensitive to all or any of those changes – it does not depend on the semantic property of how strict some function is, but the operational property of when something is evaluated. So all of the above transformations are visible to it, which is why it's only reasonable to view unsafeInterleaveIO
as performing its IO non-deterministically, more or less whenever it feels appropriate.
This is, in essence, why unsafeInterleaveIO
is unsafe - it is the only mechanism in normal use that can detect transformations that ought to be meaning-preserving. It's the only way you can detect evaluation, which by rights ought to be impossible.
As an aside, it's probably fair to mentally prepend unsafe
to every function from GHC.Prim
, and probably several other GHC.
modules as well. They're certainly not ordinary Haskell.