In RegEx
, I want to find the tag and everything between two XML tags
, like the following:
this can capture most outermost layer pair of tags, even with attribute in side or without end tags
(<!--((?!-->).)*-->|<\w*((?!\/<).)*\/>|<(?<tag>\w+)[^>]*>(?>[^<]|(?R))*<\/\k<tag>\s*>)
edit: as mentioned in comment above, regex is always not enough to parse xml, trying to modify the regex to fit more situation only makes it longer but still useless
It is not good to use this method but if you really want to split it with regex
<primaryAddress.*>((.|\n)*?)<\/primaryAddress>
the verified answer returns the tags but this just return the value between tags.
In our case, we receive an XML as a String
and need to get rid of the values that have some "special" characters, like &<>
etc. Basically someone can provide an XML to us in this form:
<notes>
<note>
<to>jenice & carl </to>
<from>your neighbor <; </from>
</note>
</notes>
So I need to find in that String
the values jenice & carl
and your neighbor <;
and properly escape &
and <
(otherwise this is an invalid xml if you later pass it to an engine that shall rename unnamed).
Doing this with regex is a rather dumb idea to begin with, but it's cheap and easy. So the brave ones that would like to do the same thing I did, here you go:
String xml = ...
Pattern p = Pattern.compile("<(.+)>(?!\\R<)(.+)</(\\1)>");
Matcher m = p.matcher(xml);
String result = m.replaceAll(mr -> {
if (mr.group(2).contains("&")) {
return "<" + m.group(1) + ">" + m.group(2) + "+ some change" + "</" + m.group(3) + ">";
}
return "<" + m.group(1) + ">" + mr.group(2) + "</" + m.group(3) + ">";
});
However, if you want to do it anyway, search for regex pattern
<primaryAddress>[\s\S]*?<\/primaryAddress>
and replace it with empty string...
You should be able to match it with: /<primaryAddress>(.+?)<\/primaryAddress>/
The content between the tags will be in the matched group.