问题
I think this question has been asked, but I not found anything.
From the Document
element in Jsoup, how I can traverse for all elements in the HTML content?
I was reading the documentation and I was thinking about using the childNodes()
method, but it only takes the nodes from one leval below (what I understand). I think I can use some recursion with this method, but I want to know if there is a more appropriate/native way to do this.
回答1:
From Document (and any Node subclass), you can use the traverse(NodeVisitor) method.
For example:
document.traverse(new NodeVisitor() {
public void head(Node node, int depth) {
System.out.println("Entering tag: " + node.nodeName());
}
public void tail(Node node, int depth) {
System.out.println("Exiting tag: " + node.nodeName());
}
});
回答2:
1) You can select all elements of the document using * selector.
Elements elements = document.body().select("*");
2) For retrieve text of each individually using Element.ownText() method.
for (Element element : elements) {
System.out.println(element.ownText());
}
3) For modify the text of each individually using Element.html(String strHtml). (clears any existing inner HTML in an element, and replaces it with parsed HTML.)
element.html(strHtml);
Hope this will help you. Thank you!
回答3:
You can use the following code:
public class JsoupDepthFirst {
private static String htmlTags(Document doc) {
StringBuilder sb = new StringBuilder();
htmlTags(doc.children(), sb);
return sb.toString();
}
private static void htmlTags(Elements elements, StringBuilder sb) {
for(Element el:elements) {
if(sb.length() > 0){
sb.append(",");
}
sb.append(el.nodeName());
htmlTags(el.children(), sb);
sb.append(",").append(el.nodeName());
}
}
public static void main(String... args){
String s = "<html><head>this is head </head><body>this is body</body></html>";
Document doc = Jsoup.parse(s);
System.out.println(htmlTags(doc));
}
}
来源:https://stackoverflow.com/questions/10111511/how-i-can-traverse-the-html-tree-using-jsoup