I want to use xml-conduit, specifically Text.XML.Stream.Parse in order to lazily extract a list of objects from a large XML file.
As a test case, I use the recently
Let me start by saying that the streaming helper API in xml-conduit has not be worked on in years, and could probably benefit from a reimagining given changes that have happened to conduit in the interim. I think there are likely much better ways to accomplish things.
That said, let me explain the problem you're seeing. The many
function creates a list of results, and will not produce any values until it has finished processing. In your case, there are so many values that this appears to never happen. Ultimately, when the entire file has been read, the entire list of users will be displayed at once. But that's clearly not the behavior you're looking for.
Instead, what you want to do is create a stream of User
values which are produced as soon as they're ready. What you want to do is basically replace the many
function call with a new function which will yield
a result each time it's parsed. A simple implementation of this could be:
yieldWhileJust :: Monad m
=> ConduitM a b m (Maybe b)
-> Conduit a m b
yieldWhileJust consumer =
loop
where
loop = do
mx <- consumer
case mx of
Nothing -> return ()
Just x -> yield x >> loop
Also, instead of using putStrLn $ unlines $ map show
, you want to attach the entire pipeline to a consumer which will print each individually yielded User
value. This can be implemented easily with Data.Conduit.List.mapM_
, e.g.: CL.mapM_ (liftIO . print)
.
I've put together a full example based on your code. The input is an artificially generated infinite XML file, just to prove the point that it really is yielding output immediately.
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RankNTypes #-}
import Control.Applicative ((<*))
import Control.Concurrent (threadDelay)
import Control.Monad (forever, void)
import Control.Monad.IO.Class (MonadIO (liftIO))
import Data.ByteString (ByteString)
import Data.Conduit
import qualified Data.Conduit.List as CL
import Data.Text (Text)
import Data.Text.Encoding (encodeUtf8)
import Data.XML.Types (Event)
import Text.XML.Stream.Parse
-- instead of actually including a large input data file, just for testing purposes
infiniteInput :: MonadIO m => Source m ByteString
infiniteInput = do
yield ""
forever $ do
yield $ encodeUtf8
"
"
liftIO $ threadDelay 1000000
--yield "|
" -- will never be reached
data User = User {name :: Text} deriving (Show)
parseUserRow :: MonadThrow m => Consumer Event m (Maybe User)
parseUserRow = tagName "row" (requireAttr "DisplayName" <* ignoreAttrs) $ \displayName -> do
return $ User displayName
parseUsers :: MonadThrow m => Conduit Event m User
parseUsers = void $ tagNoAttr "users" $ yieldWhileJust parseUserRow
yieldWhileJust :: Monad m
=> ConduitM a b m (Maybe b)
-> Conduit a m b
yieldWhileJust consumer =
loop
where
loop = do
mx <- consumer
case mx of
Nothing -> return ()
Just x -> yield x >> loop
main :: IO ()
main = infiniteInput
$$ parseBytes def
=$ parseUsers
=$ CL.mapM_ print