Is there a good way to remove HTML from a Java string? A simple regex like
replaceAll("\\\\<.*?>", &quo
It sounds like you want to go from HTML to plain text.
If that is the case look at www.htmlparser.org. Here is an example that strips all the tags out from the html file found at a URL.
It makes use of org.htmlparser.beans.StringBean.
static public String getUrlContentsAsText(String url) {
String content = "";
StringBean stringBean = new StringBean();
stringBean.setURL(url);
content = stringBean.getStrings();
return content;
}