I've been given an xml string which I need to put through a parser. Its currently complaining because of an illegal xml character. Very simplified example:
<someXml>this & that</someXml>
I know that the solution is to replace &
with &
, but I'm not generating the XML and therefore have no control over the values.
A simple string replace is not the right way to to this since the '&' has special meaning in XML and a global replace of '&' with '&' would ruin the special meaning which was intended. Is there a solution to take a full xml document and 'fix' it so that '&' become '&', but only where intended? Am I safe to globally replace ' & ' with ' & ' (note the spaces on either side)?
I think this an interesting question, because it's a situation that may really happen in real-life. Although I believe that the right thing to do is asking the XML provider to fix the XML and make it valid, I thought one option was trying with a lenient parser. I did some search and I found this blog post talking about this same problem, and suggesting the same solution that I was think of. You may try with jsoup. Let me repeat that I think this is not the best thing to do: you should really ask the XML provider to fix it.
I would suggest to ask the provider of this document to fix it. As it is, it's not (valid) XML! If they commited themselves to the XML format, they should fix it.
You can't do this, because you destroy XML characters (encode her). You must rewrite your code into library who generating XML.
Why not use a CDATA section inside any XML tag holding additional XML content? Then the lone ampersand would not be a problem.
It is not clear if you are producing the XML yourself from this question, but if you are, you may want to use an XML library to do this, as it will handle encoding things properly in the first place.
But it sounds like this is a piece of XML that you were given, so I would recommend using Apache Commons Lang to do this. It has a class 'StringEscapeUtils' which has the method you are looking for, escapeXml(String).
来源:https://stackoverflow.com/questions/5964200/how-can-i-make-my-xml-safe-for-parsing-when-it-has-character-in-it