I have below content in Java where I want to strip only html tags but not new line characters
test1 test2 test 3
//lin
You get a single line because text()
remove all whitepace characters.
But you can use a StringBuilder
and insert each line there:
final String html = "<p>test1 <b>test2</b> test 3 </p>"
+ "<p>test4 </p>";
Document doc = Jsoup.parse(html);
StringBuilder sb = new StringBuilder();
for( Element element : doc.select("p") )
{
/*
* element.text() returns the text of this element (= without tags).
*/
sb.append(element.text()).append('\n');
}
System.out.println(sb.toString().trim());
Output:
test1 test2 test 3
test4
You can also do this:
public static String cleanNoMarkup(String input) {
final Document.OutputSettings outputSettings = new Document.OutputSettings().prettyPrint(false);
String output = Jsoup.clean(input, "", Whitelist.none(), outputSettings);
return output;
}
The important things here are: 1. Whitelist.none() - so no markup is allowed 2..prettyPrint(false) - so linebreaks are not removed