Is there a way to recover iterparse on invalid Char values?

前端 未结 1 1488
感动是毒
感动是毒 2021-01-21 15:39

I\'m using lxml\'s iterparse to parse some big XML files (3-5Gig). Since some of these files have invalid characters a lxml.etree.XMLSyntaxError is thr

1条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-21 16:27

    When you say invalid characters, do you mean unicode characters? If so you can try

    lxml.etree.XMLParser(encoding='UTF-8', recover=True)
    

    If you mean malformed XML then this obviously won't work. If you can post your traceback, we can see the nature of the XMLSyntaxError which will provide more information.

    0 讨论(0)
提交回复
热议问题