Is there a good way to remove HTML from a Java string? A simple regex like
replaceAll("\\\\<.*?>", &quo
The accepted answer of doing simply Jsoup.parse(html).text()
has 2 potential issues (with JSoup 1.7.3):
<script>
into
If you use this to protect against XSS, this is a bit annoying. Here is my best shot at an improved solution, using both JSoup and Apache StringEscapeUtils:
// breaks multi-level of escaping, preventing <script> to be rendered as