I\'m attempting to write a parser using lxml and the iterparse method to step through a very large xml file containing many items.
My file is of the format:
The entire xml is parsed anyway by the core implementation. The etree.iterparse is just a view in generator style, that provides a simple filtering by tag name (see docstring http://lxml.de/api/lxml.etree.iterparse-class.html). If you want a complex filtering you should do by it's own.
A solution: registering for start event also:
iterparse(self, source, events=("start", "end",), tag="item")
and have a bool to know when you are at the "item" end, when you are the "item/url/item" end.