Remove HTML tags from a String

后端 未结 30 3200
误落风尘
误落风尘 2020-11-21 07:35

Is there a good way to remove HTML from a Java string? A simple regex like

replaceAll("\\\\<.*?>", &quo         


        
30条回答
  •  灰色年华
    2020-11-21 07:59

    To get formateed plain html text you can do that:

    String BR_ESCAPED = "<br/>";
    Element el=Jsoup.parse(html).select("body");
    el.select("br").append(BR_ESCAPED);
    el.select("p").append(BR_ESCAPED+BR_ESCAPED);
    el.select("h1").append(BR_ESCAPED+BR_ESCAPED);
    el.select("h2").append(BR_ESCAPED+BR_ESCAPED);
    el.select("h3").append(BR_ESCAPED+BR_ESCAPED);
    el.select("h4").append(BR_ESCAPED+BR_ESCAPED);
    el.select("h5").append(BR_ESCAPED+BR_ESCAPED);
    String nodeValue=el.text();
    nodeValue=nodeValue.replaceAll(BR_ESCAPED, "
    "); nodeValue=nodeValue.replaceAll("(\\s*]*>){3,}", "

    ");

    To get formateed plain text change
    by \n and change last line by:

    nodeValue=nodeValue.replaceAll("(\\s*\n){3,}", "

    ");

提交回复
热议问题