Remove HTML tags from a String

后端 未结 30 3102
误落风尘
误落风尘 2020-11-21 07:35

Is there a good way to remove HTML from a Java string? A simple regex like

replaceAll("\\\\<.*?>", &quo         


        
30条回答
  •  野趣味
    野趣味 (楼主)
    2020-11-21 07:48

    One way to retain new-line info with JSoup is to precede all new line tags with some dummy string, execute JSoup and replace dummy string with "\n".

    String html = "

    Line one

    Line two

    Line three
    etc."; String NEW_LINE_MARK = "NEWLINESTART1234567890NEWLINEEND"; for (String tag: new String[]{"

    ","
    ","","","","","","","
  • "}) { html = html.replace(tag, NEW_LINE_MARK+tag); } String text = Jsoup.parse(html).text(); text = text.replace(NEW_LINE_MARK + " ", "\n\n"); text = text.replace(NEW_LINE_MARK, "\n\n");

提交回复
热议问题