问题
We have a JAVA application that pulls the data from SAP, parses it and renders to the users. The data is pulled using JCO connector.
Recently we were thrown an exception:
org.xml.sax.SAXParseException: Character reference "�" is an invalid XML character.
So, we are planning to write a new level of indirection where ALL special/illegal characters are replaced BEFORE parsing the XML.
My questions here are :
- Is there any existing(open source) utility that does this job of replacing illegal characters in XML?
- Or if I had to write such utility, how should i handle them?
- Why is the above exception thrown?
Thank You.
回答1:
From my point of view, the source (SAP) should do the replacement. Otherwise, what it transmits to your programm may looks like XML, but is not.
While replacing the '&' by '&' can be done by a simple String.replaceAll(...) to the string from to toXML() call, others characters can be harder to replace (the '<' and '>' for exemple).
regards Guillaume
回答2:
It sounds like a bug in their escaping. Depending on context you might be best off just writing your own version of their XMLWriter class that uses a real XML library rather than trying to write your own XML utilities like the SAP developers did.
Alternatively, looking at the character code, �, you might be able to get away with a replace all on it with the empty string:
String goodXml = badXml.replaceAll("�", "");
回答3:
I've had a related, but opposite problem, where I was trying to insert character 1 into the output of an XSLT transformation. I considered post-processing to replace a marker with the zero, but instead chose to use an xsl:param.
If I was in your situation, I'd either come up with a bespoke encoding, replacing the characters which are invalid in XML, and handling them as special cases in your parsing, or if possible, replace them with whitespace.
I don't have experience with JCO, so can't advise on how or where I'd replace the invalid characters.
回答4:
You can encode/decode non-ASCII characters in XML by using the Apache Commons Lang class StringEscapeUtils escapeXML method. See:
http://commons.apache.org/lang/api-2.4/index.html
To read about how XML character references work, search for "numeric character references" on wikipedia.
来源:https://stackoverflow.com/questions/2467830/xml-parsing-with-sax-how-to-handle-special-characters