jsoup to strip only html tags not new line character?

后端 未结 2 1401
耶瑟儿~
耶瑟儿~ 2021-01-16 11:24

I have below content in Java where I want to strip only html tags but not new line characters

test1 test2 test 3

//lin
相关标签:
2条回答
  • 2021-01-16 11:44

    You get a single line because text() remove all whitepace characters. But you can use a StringBuilder and insert each line there:

    final String html = "<p>test1 <b>test2</b> test 3 </p>"
                        + "<p>test4 </p>";
    
    Document doc = Jsoup.parse(html);        
    StringBuilder sb = new StringBuilder();
    
    
    for( Element element : doc.select("p") )
    {
        /*
         * element.text() returns the text of this element (= without tags).
         */
        sb.append(element.text()).append('\n');
    }
    
    System.out.println(sb.toString().trim());
    

    Output:

    test1 test2 test 3
    test4
    
    0 讨论(0)
  • 2021-01-16 11:53

    You can also do this:

    public static String cleanNoMarkup(String input) {
        final Document.OutputSettings outputSettings = new Document.OutputSettings().prettyPrint(false);
        String output = Jsoup.clean(input, "", Whitelist.none(), outputSettings);
        return output;
    
    }
    

    The important things here are: 1. Whitelist.none() - so no markup is allowed 2..prettyPrint(false) - so linebreaks are not removed

    0 讨论(0)
提交回复
热议问题