Find everything between two XML tags with RegEx

前端 未结 5 1358
无人及你
无人及你 2020-11-27 13:02

In RegEx, I want to find the tag and everything between two XML tags, like the following:


    

        
相关标签:
5条回答
  • 2020-11-27 13:44

    this can capture most outermost layer pair of tags, even with attribute in side or without end tags

    (<!--((?!-->).)*-->|<\w*((?!\/<).)*\/>|<(?<tag>\w+)[^>]*>(?>[^<]|(?R))*<\/\k<tag>\s*>)
    

    edit: as mentioned in comment above, regex is always not enough to parse xml, trying to modify the regex to fit more situation only makes it longer but still useless

    0 讨论(0)
  • 2020-11-27 13:45

    It is not good to use this method but if you really want to split it with regex

    <primaryAddress.*>((.|\n)*?)<\/primaryAddress>
    

    the verified answer returns the tags but this just return the value between tags.

    0 讨论(0)
  • 2020-11-27 13:45

    In our case, we receive an XML as a String and need to get rid of the values that have some "special" characters, like &<> etc. Basically someone can provide an XML to us in this form:

    <notes>
      <note>
         <to>jenice & carl </to>
         <from>your neighbor <; </from>
      </note>
    </notes>
    

    So I need to find in that String the values jenice & carl and your neighbor <; and properly escape & and < (otherwise this is an invalid xml if you later pass it to an engine that shall rename unnamed).

    Doing this with regex is a rather dumb idea to begin with, but it's cheap and easy. So the brave ones that would like to do the same thing I did, here you go:

        String xml = ...
        Pattern p = Pattern.compile("<(.+)>(?!\\R<)(.+)</(\\1)>");
        Matcher m = p.matcher(xml);
        String result = m.replaceAll(mr -> {
            if (mr.group(2).contains("&")) {
                return "<" + m.group(1) + ">" + m.group(2) + "+ some change" + "</" + m.group(3) + ">";
            }
            return "<" + m.group(1) + ">" + mr.group(2) + "</" + m.group(3) + ">";
        });
    
    0 讨论(0)
  • 2020-11-27 13:57

    It is not a good idea to use regex for HTML/XML parsing...

    However, if you want to do it anyway, search for regex pattern

    <primaryAddress>[\s\S]*?<\/primaryAddress>
    

    and replace it with empty string...

    0 讨论(0)
  • 2020-11-27 14:06

    You should be able to match it with: /<primaryAddress>(.+?)<\/primaryAddress>/

    The content between the tags will be in the matched group.

    0 讨论(0)
提交回复
热议问题