jsoup: differnt result after updating from 1.7.3 to 1.8.1, how to avoid this?

后端 未结 1 929
时光取名叫无心
时光取名叫无心 2021-01-28 04:22

After updating from jsoup 1.7.3 to 1.8.1 I get differnt results. In 1.7.3 the title attribute was returned escaped, same as the input, in 1.8.1 the br is converted into a tag. I

相关标签:
1条回答
  • 2021-01-28 05:16

    It's a bit late but could help some others.

    I upgraded from jsoup 1.7.2 to 1.11.3 and had the same behaviour that the escaping is not implicit anymore.

    The following code did the trick for me:

    String cleanText = Jsoup.clean(s, Whitelist.none());
    //& and <,> are escaped from .clean call so we have to unescape them
    String cleanUnencodedText = StringEscapeUtils.unescapeHtml3(cleanText);
    String cleanEncodedText = StringEscapeUtils.escapeHtml3(cleanUnencodedText);
    

    As you can see i first had to unescape the cleanedText because & < < are escaped by Jsoup.Clean call.

    You can use unescapeHtml4 and escapeHtml4 instead of the Html 3 versions. I had to support the old html version cause e.g Html 4 escapes through &euro;

    0 讨论(0)
提交回复
热议问题