问题
I have to handle rather big XML files and I want to use the streaming API of xml-conduit
to go through them and extract the info I need.
In my case using streaming xml-conduit
is especially appealing because I don't need much data from these files, and I need to perform simple aggregations on it so conduits are perfect.
Now, I don't always know the exact structure of the file. Files are generated by different versions of (sometimes buggy) software around the world so I can't impose the schema.
I know, however, elements that I am interested in, and their shapes. But, as I said, these elements can be located in different order with other elements, etc.
What I need, I guess, is just to skip all the elements I am not interested in and only to consider ones that want.
I initially wanted to write something like that:
tagName "person" (requireAttr "age" <* ignoreAttrs) <|> ignoreTag (const True)
but it wouldn't compile because ignoreType
returns Maybe ()
What would be the way to skip all the "unknown" tags when using xml-conduit
streaming API?
回答1:
As proposed here
λ> runConduit $ Text.XML.Stream.Parse.parseLBS def "<foo>bar</foo><person age=\"25\">Michael</person><person age=\"2\">Eliezer</person>" .| many_ (choose [takeTree "person" ignoreAttrs, ignoreAnyTreeContent]) .| manyYield parsePerson .| Data.Conduit.List.consume
[Person 25 "Michael",Person 2 "Eliezer"]
来源:https://stackoverflow.com/questions/42265047/how-to-skip-elements-in-xml-conduit