Which XML parser can handle an incomplete XML file?

后端 未结 2 1546
星月不相逢
星月不相逢 2021-01-28 19:56

I am trying to parse an XML using SAX Parser but keep getting XML document structures must start and end within the same entity. which is expected as the XML doc I

相关标签:
2条回答
  • 2021-01-28 19:59

    You cannot use an XML parser to parse a file that does not contain well-formed XML. (It does not have to be valid, just well-formed. For the difference, read Well-formed vs Valid XML.)

    By definition, XML must be well-formed, otherwise it is not XML. Parsers in general have to have some fundamental constraints met in order to operate, and for XML parsers, it is well-formedness.

    Either repair the file manually first to be well-formed XML, or open it programmatically and parse it as a text file using traditional parsing techniques. An XML parser cannot help you unless you have well-formed XML.

    0 讨论(0)
  • 2021-01-28 20:06

    BeautifulSoup in Python can handle incomplete xml really well. I use it to parse prefix of large XML files for preview.

    >>> from bs4 import BeautifulSoup
    >>> BeautifulSoup('<a><b>foo</b><b>bar<','xml')
    <?xml version="1.0" encoding="unicode-escape"?>\n<a><b>foo</b><b>bar</b></a>
    
    0 讨论(0)
提交回复
热议问题