问题
I'm writing a parser in Haskell for the site using the packages Text.XML and Text.XML.Cursor.
There are unclosed tags and get an error:
Main.hs: Error parsing XML file dat.html: 29:1-29:8: Expected end element for: Name {nameLocalName = "br", nameNamespace = Nothing, namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = "body", nameNamespace = Nothing, namePrefix = Nothing})
What to do? How to ignore such tags?
回答1:
A text object with unclosed tags is not well-formed and is therefore not XML.
So, forget about using any XML libraries, parsers, or tools. They are, by definition and design, not able to help you.
You have two options. Either,
- Repair the textual object to be well-formed by closing the unclosed tags. You might do this manually or try using TIDY, or
- Define a new data format that allows unclosed tags, and write a parser from the ground up for it.
来源:https://stackoverflow.com/questions/34577021/how-to-ignore-unclosed-tags-in-xml-or-html