I am using the pipes library and need to convert a ByteString stream to a stream of lines (i.e. String
), using ASCII encoding. I am aware that there are other libraries (Pipes.Text and Pipes.Prelude) that perhaps let me yield lines from a text file more easily, but because of some other code I need to be able to get lines as String
from a Producer of ByteString
.
More formally, I need to convert a Producer ByteString IO ()
to a Producer String IO ()
, which yields lines.
I am sure this must be a one-liner for an experienced Pipes-Programmer, but I so far did not manage to successfully hack through all the FreeT
and Lens
-trickery in Pipes-ByteString.
Any help is much appreciated!
Stephan
If you need that type signature, then I would suggest this:
import Control.Foldl (mconcat, purely)
import Data.ByteString (ByteString)
import Data.Text (unpack)
import Lens.Family (view)
import Pipes (Producer, (>->))
import Pipes.Group (folds)
import qualified Pipes.Prelude as Pipes
import Pipes.Text (lines)
import Pipes.Text.Encoding (utf8)
import Prelude hiding (lines)
getLines
:: Producer ByteString IO r -> Producer String IO (Producer ByteString IO r)
getLines p = purely folds mconcat (view (utf8 . lines) p) >-> Pipes.map unpack
This works because the type of purely folds mconcat
is:
purely folds mconcat
:: (Monad m, Monoid t) => FreeT (Producer t m) r -> Producer t m r
... where t
in this case would be Text
:
purely folds mconcat
:: Monad m => FreeT (Producer Text m) r -> Producer Text m r
Any time you want to reduce each Producer
sub-group of a FreeT
-delimited stream you probably want to use purely folds
. Then it's just a matter of picking the right Fold
to reduce the sub-group with. In this case, you just want to concatenate all the Text
chunks within a group, so you pass in mconcat
. I generally don't recommend doing this since it will break on extremely long lines, but you specified that you needed this behavior.
The reason this is verbose is because the pipes
ecosystem promotes Text
over String
and also tries to encourage handling arbitrarily long lines. If you were not constrained by your other code then the more idiomatic approach would just be:
view (utf8 . lines)
After a little bit of hacking and some hints from this blog, I came up with a solution, but it is surprisingly clumsy, and I fear a bit inefficient as well, as it uses ByteString.append:
import Pipes
import qualified Pipes.ByteString as PB
import qualified Pipes.Prelude as PP
import qualified Pipes.Group as PG
import qualified Data.ByteString.Char8 as B
import Lens.Family (view )
import Control.Monad (liftM)
getLines :: Producer PB.ByteString IO r -> Producer String IO r
getLines = PG.concats . PG.maps toStringProducer . view PB.lines
toStringProducer :: Producer PB.ByteString IO r -> Producer String IO r
toStringProducer producer = go producer B.empty
where
go producer bs = do
x <- lift $ next producer
case x of
Left r -> do
yield $ B.unpack bs
return r
Right (bs', producer') -> go producer' (B.append bs' bs)
来源:https://stackoverflow.com/questions/25982213/using-haskell-pipes-bytestring-to-iterate-a-file-by-line