Jsoup - Howto clean html by escaping not deleting the unwanted html?

后端 未结 1 524
谎友^
谎友^ 2021-02-15 11:02

Is there a way of getting jsoup to clean a string with HTML in it by escaping the unwanted HTML rather than removing it completely? My example:

String dirty = \         


        
1条回答
  •  庸人自扰
    2021-02-15 11:43

    Assuming String rather than HTML documents are being parsed (as per your question) this method will work:

    public String escapeHtml(String source) {
        Document doc = Jsoup.parseBodyFragment(source);
        Elements elements = doc.select("b");
        for (Element element : elements) {
            element.replaceWith(new TextNode(element.toString(),""));
        }
        return Jsoup.clean(doc.body().toString(), new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
    }
    

    You could make the "b" tag an argument to pass in a list of tags you wish to escape.

    The associated passing JUnit test:

    @Test
    public void testHtmlEscaping() throws Exception {
        String source = "This is REALLY dirty code from haxors-r-us";
        String expected = "This is <b>REALLY</b> dirty code from \nhaxors-r-us";
        String transformed = transformer.escapeHtml(source);
        assertEquals(transformed, expected);
    }
    

    Note that I added a line return "\n" before your "a" tag in my test's "expected" String because JSoup formats the page.

    0 讨论(0)
提交回复
热议问题