Parsing Large XML file with Python lxml and Iterparse

前端 未结 1 1688
被撕碎了的回忆
被撕碎了的回忆 2021-01-15 09:04

I\'m attempting to write a parser using lxml and the iterparse method to step through a very large xml file containing many items.

My file is of the format:

相关标签:
1条回答
  • 2021-01-15 09:52

    The entire xml is parsed anyway by the core implementation. The etree.iterparse is just a view in generator style, that provides a simple filtering by tag name (see docstring http://lxml.de/api/lxml.etree.iterparse-class.html). If you want a complex filtering you should do by it's own.

    A solution: registering for start event also:

    iterparse(self, source, events=("start", "end",), tag="item")
    

    and have a bool to know when you are at the "item" end, when you are the "item/url/item" end.

    0 讨论(0)
提交回复
热议问题