Is there a good way to remove HTML from a Java string? A simple regex like
replaceAll("\\\\<.*?>", &quo
I often find that I only need to strip out comments and script elements. This has worked reliably for me for 15 years and can easily be extended to handle any element name in HTML or XML:
// delete all comments
response = response.replaceAll("", "");
// delete all script elements
response = response.replaceAll("<(script|SCRIPT)[^+]*?>[^>]*?<(/script|SCRIPT)>", "");