How to unescape HTML character entities in Java?

前端 未结 11 1760
耶瑟儿~
耶瑟儿~ 2020-11-21 22:38

Basically I would like to decode a given Html document, and replace all special chars, such as \" \" -> \" \", \">\" -

11条回答
  •  北荒
    北荒 (楼主)
    2020-11-21 23:25

    Consider using the HtmlManipulator Java class. You may need to add some items (not all entities are in the list).

    The Apache Commons StringEscapeUtils as suggested by Kevin Hakanson did not work 100% for me; several entities like ‘ (left single quote) were translated into '222' somehow. I also tried org.jsoup, and had the same problem.

提交回复
热议问题