What is pipes/conduit trying to solve

后端 未结 3 376
失恋的感觉
失恋的感觉 2021-01-30 03:28

I have seen people recommending pipes/conduit library for various lazy IO related tasks. What problem do these libraries solve exactly?

Also, when I try to use some hack

3条回答
  •  执笔经年
    2021-01-30 03:49

    If you want to use attoparsec, use attoparsec

    For my parsing tasks should I use attoparsec or pipes-attoparsec/attoparsec-conduit?

    Both pipes-attoparsec and attoparsec-conduit transform a given attoparsec Parser into a sink/conduit or pipe. Therefore you have to use attoparsec either way.

    What benefit do the pipes/conduit version give me as compared to the plain vanilla attoparsec?

    They work with pipes and conduit, where the vanilla one won't (at least not out-of-the-box).

    If you don't use conduit or pipes, and you're satisfied with the current performance of your lazy IO, there's no need to change your current flow, especially if you're not writing a big application or process large files. You can simply use attoparsec.

    However, that assumes that you know the drawbacks of lazy IO.

    What's the matter with lazy IO? (Problem study withFile)

    Lets not forget your first question:

    What problem do these libraries solve exactly ?

    They solve the streaming data problem (see 1 and 3), that occurs within functional languages with lazy IO. Lazy IO sometimes gives you not what you want (see example below), and sometimes it's hard to determine the actual system resources needed by a specific lazy operation (is the data read/written in chunks/bytes/buffered/onclose/onopen…).

    Example for over-laziness

    import System.IO
    main = withFile "myfile" ReadMode hGetContents
           >>= return . (take 5)
           >>= putStrLn
    

    This won't print anything, since the evaluation of the data happens in putStrLn, but the handle has been closed already at this point.

    Fixing fire with poisonous acid

    While the following snippet fixes this, it has another nasty feature:

    main = withFile "myfile" ReadMode $ \handle -> 
               hGetContents handle
           >>= return . (take 5)
           >>= putStrLn
    

    In this case hGetContents will read all of the file, something you didn't expect at first. If you just want to check the magic bytes of a file which could be several GB in size, this is not the way to go.

    Using withFile correctly

    The solution is, obviously, to take the things in the withFile context:

    main = withFile "myfile" ReadMode $ \handle -> 
               fmap (take 5) (hGetContents handle)
           >>= putStrLn
    

    This is by the way, also the solution mentioned by the author of pipes:

    This [..] answers a question people sometimes ask me about pipes, which I will paraphase here:

    If resource management is not a core focus of pipes, why should I use pipes instead of lazy IO?

    Many people who ask this question discovered stream programming through Oleg, who framed the lazy IO problem in terms of resource management. However, I never found this argument compelling in isolation; you can solve most resource management issues simply by separating resource acquisition from the lazy IO, like this: [see last example above]

    Which brings us back to my previous statement:

    You can simply use attoparsec [...][with lazy IO, assuming] that you know the drawbacks of lazy IO.

    References

    • Iteratee I/O, which explains the example better and provides a better overview
    • Gabriel Gonzalez (maintainer/author of pipes): Reasoning about stream programming
    • Michael Snoyman (maintainer/author of conduit): Conduit versus Enumerator

提交回复
热议问题