Removing css information from HTML in java

后端 未结 2 519
一整个雨季
一整个雨季 2021-01-13 10:58

Is there any library or pre-written code to remove css attributes from HTML code.

The requirement is, the Java code has to parse through the input html document, and

相关标签:
2条回答
  • 2021-01-13 11:20

    You could use Cyberneko to parse the document and add a simple filter that looks something like this:

    public class RemoveStyleFilter
        extends DefaultFilter
    {
      @Override
      public void startElement(QName element, XMLAttributes attributes, Augmentations augs)
        throws XNIException
      {
        for (String forbidden : new String[] {"class", "style"})
        {
          int index = attributes.getIndex(forbidden);
          if (index >= 0)
          {
            attributes.removeAttributeAt(index);
          }
        }
        super.startElement(element, attributes, augs);
      }
    }
    
    0 讨论(0)
  • 2021-01-13 11:23

    Use jsoup and NodeTraversor to remove class and style attributes from all elements

    Document doc = Jsoup.parse(input);
    
    
    NodeTraversor traversor  = new NodeTraversor(new NodeVisitor() {
    
      @Override
      public void tail(Node node, int depth) {
        if (node instanceof Element) {
            Element e = (Element) node;
            e.removeAttr("class");
            e.removeAttr("style");
        }
      }
    
      @Override
      public void head(Node node, int depth) {        
      }
    });
    
    traversor.traverse(doc.body());
    String modifiedHtml = doc.toString();
    
    0 讨论(0)
提交回复
热议问题