using haskell pipes-bytestring to iterate a file by line

こ雲淡風輕ζ 提交于 2019-12-06 03:37:40

问题


I am using the pipes library and need to convert a ByteString stream to a stream of lines (i.e. String), using ASCII encoding. I am aware that there are other libraries (Pipes.Text and Pipes.Prelude) that perhaps let me yield lines from a text file more easily, but because of some other code I need to be able to get lines as String from a Producer of ByteString.

More formally, I need to convert a Producer ByteString IO () to a Producer String IO (), which yields lines.

I am sure this must be a one-liner for an experienced Pipes-Programmer, but I so far did not manage to successfully hack through all the FreeT and Lens-trickery in Pipes-ByteString.

Any help is much appreciated!

Stephan


回答1:


If you need that type signature, then I would suggest this:

import Control.Foldl (mconcat, purely)
import Data.ByteString (ByteString)
import Data.Text (unpack)
import Lens.Family (view)
import Pipes (Producer, (>->))
import Pipes.Group (folds)
import qualified Pipes.Prelude as Pipes
import Pipes.Text (lines)
import Pipes.Text.Encoding (utf8)
import Prelude hiding (lines)

getLines
    :: Producer ByteString IO r -> Producer String IO (Producer ByteString IO r)
getLines p = purely folds mconcat (view (utf8 . lines) p) >-> Pipes.map unpack

This works because the type of purely folds mconcat is:

purely folds mconcat
    :: (Monad m, Monoid t) => FreeT (Producer t m) r -> Producer t m r

... where t in this case would be Text:

purely folds mconcat
    :: Monad m => FreeT (Producer Text m) r -> Producer Text m r

Any time you want to reduce each Producer sub-group of a FreeT-delimited stream you probably want to use purely folds. Then it's just a matter of picking the right Fold to reduce the sub-group with. In this case, you just want to concatenate all the Text chunks within a group, so you pass in mconcat. I generally don't recommend doing this since it will break on extremely long lines, but you specified that you needed this behavior.

The reason this is verbose is because the pipes ecosystem promotes Text over String and also tries to encourage handling arbitrarily long lines. If you were not constrained by your other code then the more idiomatic approach would just be:

view (utf8 . lines)



回答2:


After a little bit of hacking and some hints from this blog, I came up with a solution, but it is surprisingly clumsy, and I fear a bit inefficient as well, as it uses ByteString.append:

import Pipes
import qualified Pipes.ByteString as PB
import qualified Pipes.Prelude as PP
import qualified Pipes.Group as PG
import qualified Data.ByteString.Char8 as B
import Lens.Family (view )
import Control.Monad (liftM)

getLines :: Producer PB.ByteString IO r -> Producer String IO r
getLines = PG.concats . PG.maps toStringProducer . view PB.lines

toStringProducer :: Producer PB.ByteString IO r -> Producer String IO r
toStringProducer producer = go producer B.empty
  where
    go producer bs = do
        x <- lift $ next producer
        case x of
            Left r -> do
                yield $ B.unpack bs
                return r
            Right (bs', producer') -> go producer' (B.append bs' bs)


来源:https://stackoverflow.com/questions/25982213/using-haskell-pipes-bytestring-to-iterate-a-file-by-line

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!