lxml unicode entity parse problems

前端 未结 1 736
时光说笑
时光说笑 2021-01-21 14:55

I\'m using lxml as follows to parse an exported XML file from another system:

xmldoc = open(filename)
etree.parse(xmldoc)

But im getting:

1条回答
  •  生来不讨喜
    2021-01-21 15:25

    eacute is not a predefined entity in XML. To include an é entity reference in an XML file, it must have a declaration pointing to a DTD (such as an XHTML 1.0 DTD) that defines the entity.

    If the XML uses é but doesn't have a , it is not well-formed and the system that exported it needs to be fixed.

    (There isn't a good reason to use an entity reference to represent é in an XML file. The character reference é is understood everywhere without entity definitions, if the file can't simply include a raw UTF-8 é for some reason.)

    0 讨论(0)
提交回复
热议问题