How do I use the java library “HTML Parser” to remove all <style> tags?

梦想的初衷 提交于 2019-12-24 07:07:47

问题


I need to perform several action on a html file such as removing a specific tag or delete attributes. I decided to use HTML Parser, a java library: http://htmlparser.sourceforge.net/

First of all, I want to remove all the style tags. I managed to get a NodeList containing all the styles tag by doing this:

Parser parser = new Parser (url);
NodeList list = parser.parse (null);            
NodeList styles = list.extractAllNodesThatMatch (new TagNameFilter ("STYLE"), true);

Now I don't know how to delete this style attributes from the whole list of nodes. Do I have to fetch the whole list?

After that, I want to be able to delete all the attributes inside the tags or delete only the alt attributes for example. Is there a method which does that automatically?


回答1:


From the documentation, the Parser returns a list of trees that contains all of your html's nodes (think of the parser as the root node of a big tree of Node and each "level" of that tree is a NodeList).

You can iterate through the tree recursively, test each node's type against StyleTag and delete it from the appropriate NodeList when applicable. Keep descending into the tree recursively until you visit all its nodes.

NodeTreeWalker is your friend and can help you with the recursive tree traversal.

jsoup is another nice alternative that has a simpler interface (see this other question).



来源:https://stackoverflow.com/questions/8592418/how-do-i-use-the-java-library-html-parser-to-remove-all-style-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!