using haskell pipes-bytestring to iterate a file by line

*爱你&永不变心* 提交于 2019-12-04 07:45:14

If you need that type signature, then I would suggest this:

import Control.Foldl (mconcat, purely)
import Data.ByteString (ByteString)
import Data.Text (unpack)
import Lens.Family (view)
import Pipes (Producer, (>->))
import Pipes.Group (folds)
import qualified Pipes.Prelude as Pipes
import Pipes.Text (lines)
import Pipes.Text.Encoding (utf8)
import Prelude hiding (lines)

getLines
    :: Producer ByteString IO r -> Producer String IO (Producer ByteString IO r)
getLines p = purely folds mconcat (view (utf8 . lines) p) >-> Pipes.map unpack

This works because the type of purely folds mconcat is:

purely folds mconcat
    :: (Monad m, Monoid t) => FreeT (Producer t m) r -> Producer t m r

... where t in this case would be Text:

purely folds mconcat
    :: Monad m => FreeT (Producer Text m) r -> Producer Text m r

Any time you want to reduce each Producer sub-group of a FreeT-delimited stream you probably want to use purely folds. Then it's just a matter of picking the right Fold to reduce the sub-group with. In this case, you just want to concatenate all the Text chunks within a group, so you pass in mconcat. I generally don't recommend doing this since it will break on extremely long lines, but you specified that you needed this behavior.

The reason this is verbose is because the pipes ecosystem promotes Text over String and also tries to encourage handling arbitrarily long lines. If you were not constrained by your other code then the more idiomatic approach would just be:

view (utf8 . lines)

After a little bit of hacking and some hints from this blog, I came up with a solution, but it is surprisingly clumsy, and I fear a bit inefficient as well, as it uses ByteString.append:

import Pipes
import qualified Pipes.ByteString as PB
import qualified Pipes.Prelude as PP
import qualified Pipes.Group as PG
import qualified Data.ByteString.Char8 as B
import Lens.Family (view )
import Control.Monad (liftM)

getLines :: Producer PB.ByteString IO r -> Producer String IO r
getLines = PG.concats . PG.maps toStringProducer . view PB.lines

toStringProducer :: Producer PB.ByteString IO r -> Producer String IO r
toStringProducer producer = go producer B.empty
  where
    go producer bs = do
        x <- lift $ next producer
        case x of
            Left r -> do
                yield $ B.unpack bs
                return r
            Right (bs', producer') -> go producer' (B.append bs' bs)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!