After updating from jsoup 1.7.3 to 1.8.1 I get differnt results. In 1.7.3 the title attribute was returned escaped, same as the input, in 1.8.1 the br is converted into a tag. I
It's a bit late but could help some others.
I upgraded from jsoup 1.7.2 to 1.11.3 and had the same behaviour that the escaping is not implicit anymore.
The following code did the trick for me:
String cleanText = Jsoup.clean(s, Whitelist.none());
//& and <,> are escaped from .clean call so we have to unescape them
String cleanUnencodedText = StringEscapeUtils.unescapeHtml3(cleanText);
String cleanEncodedText = StringEscapeUtils.escapeHtml3(cleanUnencodedText);
As you can see i first had to unescape the cleanedText
because & < <
are escaped by Jsoup.Clean
call.
You can use unescapeHtml4
and escapeHtml4
instead of the Html 3 versions. I had to support the old html version cause
e.g Html 4 escapes €
through €