Currently, I\'m working on a feature that involves parsing XML that we receive from another product. I decided to run some tests against some actual customer data, and it lo
The accepted answer is good advice, and contains very useful links.
I'd like to add that this, and many other cases of not-wellformed and/or DTD-invalid XML can be repaired using SGML, the ISO-standardized superset of HTML and XML. In your case, what works is to declare the bogus THIS-IS-PART-OF-DESCRIPTION
element as SGML empty element and then use eg. the osx
program (part of the OpenSP/OpenJade SGML package) to convert it to XML. For example, if you supply the following to osx
]>
blah blah
it will output well-formed XML for further processing with the XML tools of your choice.
Note, however, that your example snippet has another problem in that element names starting with the letters xml
or XML
or Xml
etc. are reserved in XML, and won't be accepted by conforming XML parsers.