lxml unicode entity parse problems

前端 未结 1 737
时光说笑
时光说笑 2021-01-21 14:55

I\'m using lxml as follows to parse an exported XML file from another system:

xmldoc = open(filename)
etree.parse(xmldoc)

But im getting:

相关标签:
1条回答
  • 2021-01-21 15:25

    eacute is not a predefined entity in XML. To include an &eacute; entity reference in an XML file, it must have a <!DOCTYPE> declaration pointing to a DTD (such as an XHTML 1.0 DTD) that defines the entity.

    If the XML uses &eacute; but doesn't have a <!DOCTYPE>, it is not well-formed and the system that exported it needs to be fixed.

    (There isn't a good reason to use an entity reference to represent é in an XML file. The character reference &#233; is understood everywhere without entity definitions, if the file can't simply include a raw UTF-8 é for some reason.)

    0 讨论(0)
提交回复
热议问题