Remove HTML tags from a String

后端 未结 30 3115
误落风尘
误落风尘 2020-11-21 07:35

Is there a good way to remove HTML from a Java string? A simple regex like

replaceAll("\\\\<.*?>", &quo         


        
30条回答
  •  不知归路
    2020-11-21 07:45

    One could also use Apache Tika for this purpose. By default it preserves whitespaces from the stripped html, which may be desired in certain situations:

    InputStream htmlInputStream = ..
    HtmlParser htmlParser = new HtmlParser();
    HtmlContentHandler htmlContentHandler = new HtmlContentHandler();
    htmlParser.parse(htmlInputStream, htmlContentHandler, new Metadata())
    System.out.println(htmlContentHandler.getBodyText().trim())
    

提交回复
热议问题