I\'ve been using Data.Binary to serialize data to files. In my application I incrementally add items to these files. The two most popular serialization packages, binary an
So I say stick with Data.Binary
but write a new instance for growable lists. Here's the current (strict) instance:
instance Binary a => Binary [a] where
put l = put (length l) >> mapM_ put l
get = do n <- get :: Get Int
getMany n
-- | 'getMany n' get 'n' elements in order, without blowing the stack.
getMany :: Binary a => Int -> Get [a]
getMany n = go [] n
where
go xs 0 = return $! reverse xs
go xs i = do x <- get
x `seq` go (x:xs) (i-1)
{-# INLINE getMany #-}
Now, a version that lets you stream (in binary) to append to a file would need to be eager or lazy. The lazy version is the most trivial. Something like:
import Data.Binary
newtype Stream a = Stream { unstream :: [a] }
instance Binary a => Binary (Stream a) where
put (Stream []) = putWord8 0
put (Stream (x:xs)) = putWord8 1 >> put x >> put (Stream xs)
get = do
t <- getWord8
case t of
0 -> return (Stream [])
1 -> do x <- get
Stream xs <- get
return (Stream (x:xs))
Massaged appropriately works for streaming. Now, to handle silently appending, we'll need to be able to seek to the end of the file, and overwrite the final 0
tag, before adding more elements.
It's four years since this question has been answered, but I ran into the same problems as gatoatigrado in the comment to Don Stewart's answer. The put
method works as advertised, but get
reads the whole input. I believe the problem lies in the pattern match in the case statement, Stream xs <- get
, which must determine whether or not the remaining get
is a Stream a
or not before returning.
My solution used the example in Data.Binary.Get as a starting point:
import Data.ByteString.Lazy(toChunks,ByteString)
import Data.Binary(Binary(..),getWord8)
import Data.Binary.Get(pushChunk,Decoder(..),runGetIncremental)
import Data.List(unfoldr)
decodes :: Binary a => ByteString -> [a]
decodes = runGets (getWord8 >> get)
runGets :: Get a -> ByteString -> [a]
runGets g = unfoldr (decode1 d) . toChunks
where d = runGetIncremental g
decode1 _ [] = Nothing
decode1 d (x:xs) = case d `pushChunk` x of
Fail _ _ str -> error str
Done x' _ a -> Just (a,x':xs)
k@(Partial _) -> decode1 k xs
Note the use of getWord8
This is to read the encoded []
and :
resulting from the definition of put
for the stream instance. Also note, since getWord8 ignores the encoded [] and : symbols, this implementation will not detect the end of the list. My encoded file was just a single list so it works for that, but otherwise you'll need to modify.
In any case, this decodes
ran in constant memory in both cases of accessing the head and last elements.