Is there any library or pre-written code to remove css attributes from HTML code.
The requirement is, the Java code has to parse through the input html document, and
You could use Cyberneko to parse the document and add a simple filter that looks something like this:
public class RemoveStyleFilter
extends DefaultFilter
{
@Override
public void startElement(QName element, XMLAttributes attributes, Augmentations augs)
throws XNIException
{
for (String forbidden : new String[] {"class", "style"})
{
int index = attributes.getIndex(forbidden);
if (index >= 0)
{
attributes.removeAttributeAt(index);
}
}
super.startElement(element, attributes, augs);
}
}
Use jsoup and NodeTraversor to remove class and style attributes from all elements
Document doc = Jsoup.parse(input);
NodeTraversor traversor = new NodeTraversor(new NodeVisitor() {
@Override
public void tail(Node node, int depth) {
if (node instanceof Element) {
Element e = (Element) node;
e.removeAttr("class");
e.removeAttr("style");
}
}
@Override
public void head(Node node, int depth) {
}
});
traversor.traverse(doc.body());
String modifiedHtml = doc.toString();