How I can traverse the HTML tree using Jsoup?

拟墨画扇 提交于 2019-12-22 04:40:13

问题


I think this question has been asked, but I not found anything.

From the Document element in Jsoup, how I can traverse for all elements in the HTML content?

I was reading the documentation and I was thinking about using the childNodes() method, but it only takes the nodes from one leval below (what I understand). I think I can use some recursion with this method, but I want to know if there is a more appropriate/native way to do this.


回答1:


From Document (and any Node subclass), you can use the traverse(NodeVisitor) method.

For example:

document.traverse(new NodeVisitor() {
    public void head(Node node, int depth) {
        System.out.println("Entering tag: " + node.nodeName());
    }
    public void tail(Node node, int depth) {
        System.out.println("Exiting tag: " + node.nodeName());
    }
});



回答2:


1) You can select all elements of the document using * selector.

Elements elements = document.body().select("*");

2) For retrieve text of each individually using Element.ownText() method.

for (Element element : elements) {
  System.out.println(element.ownText());
}

3) For modify the text of each individually using Element.html(String strHtml). (clears any existing inner HTML in an element, and replaces it with parsed HTML.)

element.html(strHtml);

Hope this will help you. Thank you!




回答3:


You can use the following code:

public class JsoupDepthFirst {

    private static String htmlTags(Document doc) {
        StringBuilder sb = new StringBuilder();
        htmlTags(doc.children(), sb);
        return sb.toString();
    }

    private static void htmlTags(Elements elements, StringBuilder sb) {
        for(Element el:elements) {
            if(sb.length() > 0){
                sb.append(",");
            }
            sb.append(el.nodeName());
            htmlTags(el.children(), sb);
            sb.append(",").append(el.nodeName());
        }
    }

    public static void main(String... args){
        String s = "<html><head>this is head </head><body>this is body</body></html>";
        Document doc = Jsoup.parse(s);
        System.out.println(htmlTags(doc));
    }
}


来源:https://stackoverflow.com/questions/10111511/how-i-can-traverse-the-html-tree-using-jsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!