How to unescape HTML character entities in Java?

前端 未结 11 1771
耶瑟儿~
耶瑟儿~ 2020-11-21 22:38

Basically I would like to decode a given Html document, and replace all special chars, such as \" \" -> \" \", \">\" -

11条回答
  •  遥遥无期
    2020-11-21 23:32

    The libraries mentioned in other answers would be fine solutions, but if you already happen to be digging through real-world html in your project, the Jsoup project has a lot more to offer than just managing "ampersand pound FFFF semicolon" things.

    // textValue: 

    This is a sample. \"Granny\" Smith –.<\/p>\r\n // becomes this: This is a sample. "Granny" Smith –. // with one line of code: // Jsoup.parse(textValue).getText(); // for older versions of Jsoup Jsoup.parse(textValue).text(); // Another possibility may be the static unescapeEntities method: boolean strictMode = true; String unescapedString = org.jsoup.parser.Parser.unescapeEntities(textValue, strictMode);

    And you also get the convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. It's open source and MIT licence.

提交回复
热议问题