发表新帖

发表新帖

How to unescape HTML character entities in Java?

前端未结

关注

 11  1771

耶瑟儿～ 2020-11-21 22:38

Basically I would like to decode a given Html document, and replace all special chars, such as \" \" -> \" \", \">\" -

11条回答

遥遥无期 (楼主)

2020-11-21 23:32
The libraries mentioned in other answers would be fine solutions, but if you already happen to be digging through real-world html in your project, the Jsoup project has a lot more to offer than just managing "ampersand pound FFFF semicolon" things.
```
// textValue: This is a sample. \"Granny\" Smith –.<\/p>\r\n
// becomes this: This is a sample. "Granny" Smith –.
// with one line of code:
// Jsoup.parse(textValue).getText(); // for older versions of Jsoup
Jsoup.parse(textValue).text();

// Another possibility may be the static unescapeEntities method:
boolean strictMode = true;
String unescapedString = org.jsoup.parser.Parser.unescapeEntities(textValue, strictMode);
```
And you also get the convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. It's open source and MIT licence.
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...

热议问题