Java regex to strip out XML tags, but not tag contents

前端 未结 6 1564
有刺的猬
有刺的猬 2021-02-08 13:37

I have the following Java code:

str = str.replaceAll(\"<.*?>.*?|<.*?/>\", \"\");

This turns a String like so:



        
相关标签:
6条回答
  • 2021-02-08 13:44

    You can try this too:

    str = str.replaceAll("<.*?>", "");
    

    Please have a look at the below example for better understanding:

    public class StringUtils {
    
        public static void main(String[] args) {
            System.out.println(StringUtils.replaceAll("How now <fizz>brown</fizz> cow."));
            System.out.println(StringUtils.replaceAll("How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow."));
        }
    
        public static String replaceAll(String strInput) {
            return strInput.replaceAll("<.*?>", "");
        }
    }
    

    Output:

    How now brown cow.
    How now brown cow.
    
    0 讨论(0)
  • 2021-02-08 13:49

    You were almost there ;)

    Try this:

    str = str.replaceAll("<.*?>", "")
    
    0 讨论(0)
  • 2021-02-08 13:52

    While there are other correct answers, none give any explanation.

    The reason your regex <.*?>.*?</.*?>|<.*?/> doesn't work is because it will select any tags as well as everything inside them. You can see that in action on debuggex.

    The reason your second attempt <.*?></.*?>|<.*?/> doesn't work is because it will select from the beginning of a tag up to the first close tag following a tag. That is kind of a mouthful, but you can understand better what's going on in this example.

    The regex you need is much simpler: <.*?>. It simply selects every tag, ignoring if it's open/close. Visualization.

    0 讨论(0)
  • 2021-02-08 13:58
    "How now <fizz>brown</fizz> cow.".replaceAll("<[^>]+>", "")
    
    0 讨论(0)
  • 2021-02-08 14:01

    This isn't elegant, but it is easy to follow. The below code removes the start and end XML tags if they are present in a line together

    <url>"www.xml.com"<\url> , <body>"This is xml"<\body>

    Regex :

    to_replace='<\w*>|<\/\w*>',value="" 
    
    0 讨论(0)
  • 2021-02-08 14:01

    If you want to parse XML log file so you can do with regex {java}, <[^<]+<.so you get <name>DEV</name>. Output like name>DEV. You have to just play with REGEX.

    0 讨论(0)
提交回复
热议问题