How to parse invalid (bad / not well-formed) XML?

后端 未结 4 689
自闭症患者
自闭症患者 2020-11-21 04:44

Currently, I\'m working on a feature that involves parsing XML that we receive from another product. I decided to run some tests against some actual customer data, and it lo

4条回答
  •  离开以前
    2020-11-21 05:23

    The accepted answer is good advice, and contains very useful links.

    I'd like to add that this, and many other cases of not-wellformed and/or DTD-invalid XML can be repaired using SGML, the ISO-standardized superset of HTML and XML. In your case, what works is to declare the bogus THIS-IS-PART-OF-DESCRIPTION element as SGML empty element and then use eg. the osx program (part of the OpenSP/OpenJade SGML package) to convert it to XML. For example, if you supply the following to osx

    
      
      
    ]>
    
      blah blah
        
      
    
    

    it will output well-formed XML for further processing with the XML tools of your choice.

    Note, however, that your example snippet has another problem in that element names starting with the letters xml or XML or Xml etc. are reserved in XML, and won't be accepted by conforming XML parsers.

提交回复
热议问题