问题
I have a collection of XML files, and some of them are pretty big (up to ~50 million element nodes). I am using xmllint
for validating those files, which works pretty nicely even for the huge ones thanks to the streaming API.
xmllint --loaddtd --stream --valid /path/to/huge.xml
I recently learned that xmllint
is also capable of doing command line XPath queries, which is very handy.
xmllint --loaddtd --xpath '/root/a/b/c/text()' /path/to/small.xml
However, these XPath queries do not work for the huge XML files. I just receive a "Killed" message after some time. I tried to enable the streaming API, but this just leads to no output at all.
xmllint --loaddtd --stream --xpath '/root/a/b/c/text()' /path/to/huge.xml
Is there a way to enable streaming mode when doing XPath queries using xmllint
? Are there other/better ways to do command line XPath queries for huge XML files?
回答1:
If your XPath expressions are very simple, try xmlcutty.
From the homepage:
xmlcutty is a simple tool for carving out elements from large XML files, fast. Since it works in a streaming fashion, it uses almost no memory and can process around 1G of XML per minute.
回答2:
change ulimits
might work. Try this:
$ ulimit -Sv 500000
$ xmllint (...your command)
来源:https://stackoverflow.com/questions/30305724/how-to-do-command-line-xpath-queries-in-huge-xml-files