问题
I have a big XML file with several article
nodes. I have included only one with the problem. I try to parse it in Python to filter some data and I get the error
File "<string>", line unknown
ParseError: undefined entity Ö: line 90, column 17
Sample of the XML file
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<article mdate="2019-10-25" key="tr/gte/TR-0146-06-91-165" publtype="informal">
<author>Alejandro P. Buchmann</author>
<author>M. Tamer Özsu</author>
<author>Dimitrios Georgakopoulos</author>
<title>Towards a Transaction Management System for DOM.</title>
<journal>GTE Laboratories Incorporated</journal>
<volume>TR-0146-06-91-165</volume>
<month>June</month>
<year>1991</year>
<url>db/journals/gtelab/index.html#TR-0146-06-91-165</url>
</article>
</dblp>
From my search in Google, I found that this kind of error appears if you have issues in the node names. However, the line with the error is the second author
, in the text.
This is my Python code
with open('xaa.xml', 'r') as xml_file:
xml_tree = etree.parse(xml_file)
回答1:
The declaration of the Ouml
entity is presumably in the DTD (dblp.dtd), but ElementTree does not support external DTDs. ElementTree only recognizes entities declared directly in the XML file (in the "internal subset"). This is a working example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp [
<!ENTITY Ouml 'Ö'>
]>
<dblp>
<article mdate="2019-10-25" key="tr/gte/TR-0146-06-91-165" publtype="informal">
<author>Alejandro P. Buchmann</author>
<author>M. Tamer Özsu</author>
<author>Dimitrios Georgakopoulos</author>
<title>Towards a Transaction Management System for DOM.</title>
<journal>GTE Laboratories Incorporated</journal>
<volume>TR-0146-06-91-165</volume>
<month>June</month>
<year>1991</year>
<url>db/journals/gtelab/index.html#TR-0146-06-91-165</url>
</article>
</dblp>
To parse the XML file in the question without errors, you need a more powerful XML library that supports external DTDs. lxml is a good choice for that.
来源:https://stackoverflow.com/questions/60746298/parseerror-undefined-entity-while-parsing-xml-file-in-python