Remove HTML tags from a String

后端 未结 30 3104
误落风尘
误落风尘 2020-11-21 07:35

Is there a good way to remove HTML from a Java string? A simple regex like

replaceAll("\\\\<.*?>", &quo         


        
30条回答
  •  离开以前
    2020-11-21 07:47

    Use a HTML parser instead of regex. This is dead simple with Jsoup.

    public static String html2text(String html) {
        return Jsoup.parse(html).text();
    }
    

    Jsoup also supports removing HTML tags against a customizable whitelist, which is very useful if you want to allow only e.g. , and .

    See also:

    • RegEx match open tags except XHTML self-contained tags
    • What are the pros and cons of the leading Java HTML parsers?
    • XSS prevention in JSP/Servlet web application

提交回复
热议问题