iterparse | 易学教程

ElementTree iterparse strategy

阅读更多关于 ElementTree iterparse strategy

问题 I have to handle xml documents that are big enough (up to 1GB) and parse them with python. I am using the iterparse() function (SAX style parsing). My concern is the following, imagine you have an xml like this <?xml version="1.0" encoding="UTF-8" ?> <families> <family> <name>Simpson</name> <members> <name>Homer</name> <name>Marge</name> <name>Bart</name> </members> </family> <family> <name>Griffin</name> <members> <name>Peter</name> <name>Brian</name> <name>Meg</name> </members> </family> <

Why is lxml.etree.iterparse() eating up all my memory?

阅读更多关于 Why is lxml.etree.iterparse() eating up all my memory?

问题 This eventually consumes all my available memory and then the process is killed. I've tried changing the tag from schedule to 'smaller' tags but that didn't make a difference. What am I doing wrong / how can I process this large file with iterparse() ? import lxml.etree for schedule in lxml.etree.iterparse('really-big-file.xml', tag='schedule'): print "why does this consume all my memory?" I can easily cut it up and process it in smaller chunks but that's uglier than I'd like. 回答1: As

using lxml and iterparse() to parse a big (+- 1Gb) XML file

阅读更多关于 using lxml and iterparse() to parse a big (+- 1Gb) XML file

问题 I have to parse a 1Gb XML file with a structure such as below and extract the text within the tags \"Author\" and \"Content\": <Database> <BlogPost> <Date>MM/DD/YY</Date> <Author>Last Name, Name</Author> <Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.</Content> </BlogPost> <BlogPost> <Date>MM/DD/YY</Date> <Author>Last Name, Name</Author> <Content>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas dictum dictum vehicula.<