Parsing XML with python using xml.sax, but my code fails to catch Entities. Why doesn\'t skippedEntity() or resolveEntity() report in the following:
import
Here is a modified version of your program that I hope makes sense. It demonstrates a case where all TestHandler
methods are called.
import StringIO
import xml.sax
from xml.sax.handler import ContentHandler
# Inheriting from EntityResolver and DTDHandler is not necessary
class TestHandler(ContentHandler):
# This method is only called for external entities. Must return a value.
def resolveEntity(self, publicID, systemID):
print "TestHandler.resolveEntity(): %s %s" % (publicID, systemID)
return systemID
def skippedEntity(self, name):
print "TestHandler.skippedEntity(): %s" % (name)
def unparsedEntityDecl(self, name, publicID, systemID, ndata):
print "TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID)
def startElement(self, name, attrs):
summary = attrs.get('summary', '')
print 'TestHandler.startElement():', summary
def main(xml_string):
try:
parser = xml.sax.make_parser()
curHandler = TestHandler()
parser.setContentHandler(curHandler)
parser.setEntityResolver(curHandler)
parser.setDTDHandler(curHandler)
stream = StringIO.StringIO(xml_string)
parser.parse(stream)
stream.close()
except xml.sax.SAXParseException, e:
print "*** PARSER error: %s" % e
XML = """<!DOCTYPE test SYSTEM "test.dtd">
<test summary='step: #'>Entity: ¬</test>
"""
main(XML)
test.dtd contains:
<!ENTITY num "FOO">
<!ENTITY pic SYSTEM 'bar.gif' NDATA gif>
Output:
TestHandler.resolveEntity(): None test.dtd
TestHandler.unparsedEntityDecl(): None bar.gif
TestHandler.startElement(): step: FOO
TestHandler.skippedEntity(): not
Addition
As far as I can tell, skippedEntity
is called only when an external DTD is used (at least I can't come up with a counterexample; it would be nice if the the documentation was a little clearer).
Adam said in his answer that resolveEntity
is called only for external DTDs. But that is not quite true. resolveEntity
is also called when processing a reference to an external entity that is declared in an internal or external DTD subset. For example:
<!DOCTYPE test [
<!ENTITY num SYSTEM "bar.txt">
]>
where the content of bar.txt could be, say, FOO
. In this case it is not possible to refer to the entity in an attribute value.
I think resolveEntity and skippedEntity are only called for external DTDs. I got this to work by modifying the XML.
XML = """<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE test SYSTEM "external.dtd" >
<test summary='step: &foo; &bar;'>Entity: ¬</test>
"""
The external.dtd contains two simple entity declarations.
<!ENTITY foo "bar">
<!ENTITY bar "foo">
Also, I got rid of resolveEntity.
This outputs -
TestHandler.startElement(), test : step: bar foo ()
TestHandler.skippedEntity: not
Hope this helps.