I\'m using lxml\'s iterparse
to parse some big XML files (3-5Gig). Since some of these files have invalid characters a lxml.etree.XMLSyntaxError
is thr
When you say invalid characters, do you mean unicode characters? If so you can try
lxml.etree.XMLParser(encoding='UTF-8', recover=True)
If you mean malformed XML then this obviously won't work. If you can post your traceback, we can see the nature of the XMLSyntaxError
which will provide more information.