Java - Removing the double quotes in XML attributes

爷,独闯天下 提交于 2019-12-20 07:20:29

问题


I have an xml string which I get via a REST call. However, some of the attributes have corrupted values. For example:

<property name="foo" value="Some corrupted String because of "something" like that"/>

How can I replace double-quotes either not preceded by value= or not follown by /> with a single quote and get a valid XML string out of that corrupted one in Java 6?

EDIT:

I have tried to modify this lookahead/lookbehind regex that was used for VisualBasic. But because of the incompatibility of escape characters I guess, I could not create the Java version of it. Here it is:

(?<=^[^""]*""(?>[^""]*""[^""]*"")*[^""]*)"(?! \s+ \w+=|\s* [/?]?" >)|(?<!\w+=)""(?=[^""]*""(?>[^""]*""[^""]*"")*[^""]*$)


回答1:


You can use the following regex:

\s+[\w:.-]+="([^"]*(?:"(?!\s+[\w:.-]+="|\s*(?:\/?|\?)>)[^"]*)*)"

See regex demo. It will match any attribute name/value pair capturing the latter into Group 1 that we can change inside a callback.

Here is a Java code demo:

String s =  "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <resources> <resource> <properties> <property name=\"name\" value=\"retrieveFoo\"/>\n<property name=\"foo\" value=\"Some corrupted String because of \"something\" like that\"/>";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(\\s+[\\w:.-]+=\")([^\"]*(?:\"(?!\\s+[\\w:.-]+=\"|\\s*(?:/?|\\?)>)[^\"]*)*)\"").matcher(s);
while (m.find()) {
    m.appendReplacement(result, m.group(1) + m.group(2).replace("\"", "&quot;") + "\"");
}
m.appendTail(result);
System.out.println(result.toString());

Output:

<?xml version="1.0" encoding="UTF-8"?> <resources> <resource> <properties> <property name="name" value="retrieveFoo"/> <property name="foo" value="Some corrupted String because of &quot;something&quot; like that"/>



来源:https://stackoverflow.com/questions/33744460/java-removing-the-double-quotes-in-xml-attributes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!