I'm parsing a modified XML from http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html
Here's what it looks like:
<?xml version="1.0" encoding="utf-8"?>
<population xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com">
<success>true</success>
<row_count>2</row_count>
<summary>
<bananas>0</bananas>
</summary>
<people>
<person>
<firstname>Michael</firstname>
<age>25</age>
</person>
<person>
<firstname>Eliezer</firstname>
<age>2</age>
</person>
</people>
</population>
How do I get a list of firstname
and age
for every person?
My goal is to use http-conduit to download this xml and then parse it, but I am looking for a solution on how to parse when there are no attributes (use tagNoAttrs?)
Here's what I've tried, and I've added my questions in the Haskell comments:
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource
import Data.Conduit (($$))
import Data.Text (Text, unpack)
import Text.XML.Stream.Parse
import Control.Applicative ((<*))
data Person = Person Int Text
deriving Show
-- Do I need to change the lambda function \age to something else to get both name and age?
parsePerson = tagNoAttr "person" $ \age -> do
name <- content -- How do I get age from the content? "unpack" is for attributes
return $ Person age name
parsePeople = tagNoAttr "people" $ many parsePerson
-- This doesn't ignore the xmlns attributes
parsePopulation = tagName "population" (optionalAttr "xmlns" <* ignoreAttrs) $ parsePeople
main = do
people <- runResourceT $
parseFile def "people2.xml" $$ parsePopulation
print people
Firstly: parsing combinators in xml-conduit haven't been updated in quite a while, and show their age. I recommend most people to use the DOM or cursor interface instead. That said, let's look at your example. There are two problems with your code:
- It doesn't properly handle XML namespaces. All of the element names are in the
http://example.com
namespace, and your code needs to reflect that. - The parsing combinators demand that you account for all elements. They won't automatically skip over some elements for you.
So here's an implementation using the streaming API that gets the desired result:
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource (runResourceT)
import Data.Conduit (Consumer, ($$))
import Data.Text (Text)
import Data.Text.Read (decimal)
import Data.XML.Types (Event)
import Text.XML.Stream.Parse
data Person = Person Int Text
deriving Show
-- Do I need to change the lambda function \age to something else to get both name and age?
parsePerson :: MonadThrow m => Consumer Event m (Maybe Person)
parsePerson = tagNoAttr "{http://example.com}person" $ do
name <- force "firstname tag missing" $ tagNoAttr "{http://example.com}firstname" content
ageText <- force "age tag missing" $ tagNoAttr "{http://example.com}age" content
case decimal ageText of
Right (age, "") -> return $ Person age name
_ -> force "invalid age value" $ return Nothing
parsePeople :: MonadThrow m => Consumer Event m [Person]
parsePeople = force "no people tag" $ do
_ <- tagNoAttr "{http://example.com}success" content
_ <- tagNoAttr "{http://example.com}row_count" content
_ <- tagNoAttr "{http://example.com}summary" $
tagNoAttr "{http://example.com}bananas" content
tagNoAttr "{http://example.com}people" $ many parsePerson
-- This doesn't ignore the xmlns attributes
parsePopulation :: MonadThrow m => Consumer Event m [Person]
parsePopulation = force "population tag missing" $
tagName "{http://example.com}population" ignoreAttrs $ \() -> parsePeople
main :: IO ()
main = do
people <- runResourceT $
parseFile def "people2.xml" $$ parsePopulation
print people
Here's an example using the cursor API. Note that it has different error handling characteristics, but should produce the same result for well-formed input.
{-# LANGUAGE OverloadedStrings #-}
import Text.XML
import Text.XML.Cursor
import Data.Text (Text)
import Data.Text.Read (decimal)
import Data.Monoid (mconcat)
main :: IO ()
main = do
doc <- Text.XML.readFile def "people2.xml"
let cursor = fromDocument doc
print $ cursor $// element "{http://example.com}person" >=> parsePerson
data Person = Person Int Text
deriving Show
parsePerson :: Cursor -> [Person]
parsePerson c = do
let name = c $/ element "{http://example.com}firstname" &/ content
ageText = c $/ element "{http://example.com}age" &/ content
case decimal $ mconcat ageText of
Right (age, "") -> [Person age $ mconcat name]
_ -> []
来源:https://stackoverflow.com/questions/22748303/get-all-names-from-xml-conduit