How to do command line XPath queries in huge XML files?

元气小坏坏 提交于 2019-12-04 18:49:34

问题


I have a collection of XML files, and some of them are pretty big (up to ~50 million element nodes). I am using xmllint for validating those files, which works pretty nicely even for the huge ones thanks to the streaming API.

xmllint --loaddtd --stream --valid /path/to/huge.xml

I recently learned that xmllint is also capable of doing command line XPath queries, which is very handy.

xmllint --loaddtd --xpath '/root/a/b/c/text()' /path/to/small.xml

However, these XPath queries do not work for the huge XML files. I just receive a "Killed" message after some time. I tried to enable the streaming API, but this just leads to no output at all.

xmllint --loaddtd --stream --xpath '/root/a/b/c/text()' /path/to/huge.xml

Is there a way to enable streaming mode when doing XPath queries using xmllint? Are there other/better ways to do command line XPath queries for huge XML files?


回答1:


If your XPath expressions are very simple, try xmlcutty.

From the homepage:

xmlcutty is a simple tool for carving out elements from large XML files, fast. Since it works in a streaming fashion, it uses almost no memory and can process around 1G of XML per minute.




回答2:


change ulimits might work. Try this:

$ ulimit -Sv 500000
$ xmllint (...your command)


来源:https://stackoverflow.com/questions/30305724/how-to-do-command-line-xpath-queries-in-huge-xml-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!