How can I make my xml safe for parsing (when it has & character in it)?

三世轮回 提交于 2019-12-05 14:32:26

I think this an interesting question, because it's a situation that may really happen in real-life. Although I believe that the right thing to do is asking the XML provider to fix the XML and make it valid, I thought one option was trying with a lenient parser. I did some search and I found this blog post talking about this same problem, and suggesting the same solution that I was think of. You may try with jsoup. Let me repeat that I think this is not the best thing to do: you should really ask the XML provider to fix it.

I would suggest to ask the provider of this document to fix it. As it is, it's not (valid) XML! If they commited themselves to the XML format, they should fix it.

You can't do this, because you destroy XML characters (encode her). You must rewrite your code into library who generating XML.

Why not use a CDATA section inside any XML tag holding additional XML content? Then the lone ampersand would not be a problem.

It is not clear if you are producing the XML yourself from this question, but if you are, you may want to use an XML library to do this, as it will handle encoding things properly in the first place.

But it sounds like this is a piece of XML that you were given, so I would recommend using Apache Commons Lang to do this. It has a class 'StringEscapeUtils' which has the method you are looking for, escapeXml(String).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!