The question says it all. More specifically, I am writing bindings to a C library, and I\'m wondering what c functions I can use unsafePerformIO
with. I assume
The standard trick to instantiate global mutable variables in haskell:
{-# NOINLINE bla #-}
bla :: IORef Int
bla = unsafePerformIO (newIORef 10)
I also use it to close over the global variable if I want to prevent access to it outside of functions I provide:
{-# NOINLINE printJob #-}
printJob :: String -> Bool -> IO ()
printJob = unsafePerformIO $ do
p <- newEmptyMVar
return $ \a b -> do
-- here's the function code doing something
-- with variable p, no one else can access.
Sure. You can have a look at a real example here but in general, unsafePerformIO
is usable on any pure function that happens to be side effecting. The IO
monad may still be needed to track effects (e.g. freeing memory after the value is computed) even when the function is pure (e.g computing a factorial).
I'm wondering what c functions I can use unsafePerformIO with. I assume using unsafePerformIO with anything involving pointers is a big no-no.
Depends! unsafePerformIO
will fully perform actions and force out all the laziness, but that doesn't mean it will break your program. In general, Haskellers prefer unsafePerformIO
to appear only in pure functions, so you can use it on results of e.g. scientific computations but maybe not file reads.
No need to involve C here. The unsafePerformIO
function can be used in any situation where,
You know that its use is safe, and
You are unable to prove its safety using the Haskell type system.
For instance, you can make a memoize function using unsafePerformIO
:
memoize :: Ord a => (a -> b) -> a -> b
memoize f = unsafePerformIO $ do
memo <- newMVar $ Map.empty
return $ \x -> unsafePerformIO $ modifyMVar memo $ \memov ->
return $ case Map.lookup x memov of
Just y -> (memov, y)
Nothing -> let y = f x
in (Map.insert x y memov, y)
(This is off the top of my head, so I have no idea if there are flagrant errors in the code.)
The memoize function uses and modifies a memoization dictionary, but since the function as a whole is safe, you can give it a pure type (with no use of the IO
monad). However, you have to use unsafePerformIO
to do that.
Footnote: When it comes to the FFI, you are responsible for providing the types of the C functions to the Haskell system. You can achieve the effect of unsafePerformIO
by simply omitting IO
from the type. The FFI system is inherently unsafe, so using unsafePerformIO
doesn't make much of a difference.
Footnote 2: There are often really subtle bugs in code that uses unsafePerformIO
, the example is just a sketch of a possible use. In particular, unsafePerformIO
can interact poorly with the optimizer.
The way I see it, the various unsafe*
nonfunctions really should only be used in cases where you want to do something that respects referential transparency but whose implementation would otherwise require augmenting the compiler or runtime system to add a new primitive capability. It's easier, more modular, readable, maintainable and agile to use the unsafe stuff than to have to modify the language implementation for things like that.
FFI work often intrinsically requires you to do this sort of thing.
Obviously if it should never be used, it wouldn't be in the standard libraries. ;-)
There are a number of reasons why you might use it. Examples include:
Initialising global mutable state. (Whether you should ever have such a thing in the first place is a whole other discussion...)
Lazy I/O is implemented using this trick. (Again, whether lazy I/O is a good idea in the first place is debatable.)
The trace
function uses it. (Yet again, it turns out trace
is rather less useful than you might imagine.)
Perhaps most significantly, you can use it to implement data structures which are referentially transparent, but internally implemented using impure code. Often the ST
monad will let you do that, but sometimes you need a little unsafePerformIO
.
Lazy I/O can be seen as a special-case of the last point. So can memoisation.
Consider, for example, an "immutable", growable array. Internally you could implement that as a pure "handle" that points to a mutable array. The handle holds the user-visible size of the array, but the actual underlying mutable array is larger than that. When the user "appends" to the array, a new handle is returned, with a new, larger size, but the append is performed by mutating the underlying mutable array.
You can't do this with the ST
monad. (Or rather, you can, but it still requires unsafePerformIO
.)
Note that it's damned tricky to get this sort of thing right. And the type checker won't catch if it you're wrong. (What's what unsafePerformIO
does; it makes the type checker not check that you're doing it correctly!) For example, if you append to an "old" handle, the correct thing to do would be to copy the underlying mutable array. Forget this, and your code will behave very strangely.
Now, to answer your real question: There's no particular reason why "anything withou pointers" should be a no-no for unsafePerformIO
. When asking whether to use this function or not, the only question of significance is this: Can the end-user observe any side-effects from doing this?
If the only thing it does is create some buffer somewhere that the user can't "see" from pure code, that's fine. If it writes to a file on disk... not so fine.
HTH.
In the specific case of the FFI, unsafePerformIO
is meant to be used for calling things that are mathematical functions, i.e. the output depends solely on the input parameters, and every time the function is called with the same inputs, it will return the same output. Also, the function shouldn't have side effects, such as modifying data on disk, or mutating memory.
Most functions from <math.h>
could be called with unsafePerformIO
, for example.
You're correct that unsafePerformIO
and pointers don't usually mix. For example, suppose you have
p_sin(double *p) { return sin(*p); }
Even though you're just reading a value from a pointer, it's not safe to use unsafePerformIO
. If you wrap p_sin
, multiple calls can use the pointer argument, but get different results. It's necessary to keep the function in IO
to ensure that it's sequenced properly in relation to pointer updates.
This example should make clear one reason why this is unsafe:
# file export.c
#include <math.h>
double p_sin(double *p) { return sin(*p); }
# file main.hs
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.Ptr
import Foreign.Marshal.Alloc
import Foreign.Storable
foreign import ccall "p_sin"
p_sin :: Ptr Double -> Double
foreign import ccall "p_sin"
safeSin :: Ptr Double -> IO Double
main :: IO ()
main = do
p <- malloc
let sin1 = p_sin p
sin2 = safeSin p
poke p 0
putStrLn $ "unsafe: " ++ show sin1
sin2 >>= \x -> putStrLn $ "safe: " ++ show x
poke p 1
putStrLn $ "unsafe: " ++ show sin1
sin2 >>= \x -> putStrLn $ "safe: " ++ show x
When compiled, this program outputs
$ ./main
unsafe: 0.0
safe: 0.0
unsafe: 0.0
safe: 0.8414709848078965
Even though the value referenced by the pointer has changed between the two references to "sin1", the expression isn't re-evaluated, leading to stale data being used. Since safeSin
(and hence sin2
) is in IO, the program is forced to re-evaluate the expression, so the updated pointer data is used instead.