Parsing (very) large XML files with XmlSlurper

风流意气都作罢 提交于 2019-12-23 10:53:12

问题


I am kind of new to Groovy and I am trying to read a (quite) large XML file (more than 1Gb) using XmlSlurper, which is supposed to work wonders with large files due to the fact that it doesn't build the whole DOM in memory.

Nevertheless I keep getting "OutOfMemoryError : Java heap space" which makes me think that there obviously is something that I'm doing wrong. I tried increasing the Xmx setting but I would rather solve the problem since I may have to deal with even bigger files afterwards.

Here is the line of code I used:

def posts = new XmlSlurper().parse(new File("posts.xml"))

Any hint on what's wrong ?

Thanks in advance,

Jérémie.


回答1:


Groovy's XmlSlurper is a SAX parser, but loads the entire model into memory...

To avoid OOM exceptions, you probably need to either up your memory allowance (as you say, using the -Xmx setting), or you can write your own SAX parser to get just the data you require from the document




回答2:


I'm a bit late to this party, but I've been having the same issue also.

I made a proposition to the groovy-user mailing list, actually proposing to add something that looks like the XML::Twig perl module to XmlSlurper.

def xpathSlurper = new XPathXmlSlurper2();    
def c = { twig, it ->      
    println it.text().trim();
    twig.purgeCurrent();
}
xpathSlurper.setTwigRootHandler(xpath, c);
def fdata = xpathSlurper.parse(new File("test.xml")); 

I've attached the sample code here: http://groovy.329449.n5.nabble.com/first-step-toward-Xml-Twig-for-Groovy-groovy-util-XPathXmlSlurper2-groovy-td4923577.html

I hope this helps!



来源:https://stackoverflow.com/questions/9977418/parsing-very-large-xml-files-with-xmlslurper

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!