What is the most efficient way to iterate through all DOM elements in Java?
Something like this but for every single DOM elements on current org.w3c.dom.Docume
I also stumbled over this problem recently. Here is my solution. I wanted to avoid recursion, so I used a while loop.
Because of the adds and removes in arbitrary places on the list,
I went with the LinkedList
implementation.
/* traverses tree starting with given node */
private static List<Node> traverse(Node n)
{
return traverse(Arrays.asList(n));
}
/* traverses tree starting with given nodes */
private static List<Node> traverse(List<Node> nodes)
{
List<Node> open = new LinkedList<Node>(nodes);
List<Node> visited = new LinkedList<Node>();
ListIterator<Node> it = open.listIterator();
while (it.hasNext() || it.hasPrevious())
{
Node unvisited;
if (it.hasNext())
unvisited = it.next();
else
unvisited = it.previous();
it.remove();
List<Node> children = getChildren(unvisited);
for (Node child : children)
it.add(child);
visited.add(unvisited);
}
return visited;
}
private static List<Node> getChildren(Node n)
{
List<Node> children = asList(n.getChildNodes());
Iterator<Node> it = children.iterator();
while (it.hasNext())
if (it.next().getNodeType() != Node.ELEMENT_NODE)
it.remove();
return children;
}
private static List<Node> asList(NodeList nodes)
{
List<Node> list = new ArrayList<Node>(nodes.getLength());
for (int i = 0, l = nodes.getLength(); i < l; i++)
list.add(nodes.item(i));
return list;
}
Basically you have two ways to iterate over all elements:
1. Using recursion (the most common way I think):
public static void main(String[] args) throws SAXException, IOException,
ParserConfigurationException, TransformerException {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new File("document.xml"));
doSomething(document.getDocumentElement());
}
public static void doSomething(Node node) {
// do something with the current node instead of System.out
System.out.println(node.getNodeName());
NodeList nodeList = node.getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node currentNode = nodeList.item(i);
if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
//calls this method for all the children which is Element
doSomething(currentNode);
}
}
}
2. Avoiding recursion using getElementsByTagName()
method with *
as parameter:
public static void main(String[] args) throws SAXException, IOException,
ParserConfigurationException, TransformerException {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new File("document.xml"));
NodeList nodeList = document.getElementsByTagName("*");
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
// do something with the current element
System.out.println(node.getNodeName());
}
}
}
I think these ways are both efficient.
Hope this helps.
for (int i = 0; i < nodeList.getLength(); i++)
change to
for (int i = 0, len = nodeList.getLength(); i < len; i++)
to be more efficient.
The second way of javanna answer may be the best as it tends to use a flatter, predictable memory model.