Parsing XML Entity with python xml.sax

后端 未结 2 812
挽巷
挽巷 2021-01-16 09:54

Parsing XML with python using xml.sax, but my code fails to catch Entities. Why doesn\'t skippedEntity() or resolveEntity() report in the following:

import          


        
相关标签:
2条回答
  • 2021-01-16 10:21

    Here is a modified version of your program that I hope makes sense. It demonstrates a case where all TestHandler methods are called.

    import StringIO
    import xml.sax
    from xml.sax.handler import ContentHandler
    
    # Inheriting from EntityResolver and DTDHandler is not necessary
    class TestHandler(ContentHandler):
    
        # This method is only called for external entities. Must return a value. 
        def resolveEntity(self, publicID, systemID):
            print "TestHandler.resolveEntity(): %s %s" % (publicID, systemID)
            return systemID
    
        def skippedEntity(self, name):
            print "TestHandler.skippedEntity(): %s" % (name)
    
        def unparsedEntityDecl(self, name, publicID, systemID, ndata):
            print "TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID)
    
        def startElement(self, name, attrs):
            summary = attrs.get('summary', '')
            print 'TestHandler.startElement():', summary
    
    def main(xml_string):
        try:
            parser = xml.sax.make_parser()
            curHandler = TestHandler()
            parser.setContentHandler(curHandler)
            parser.setEntityResolver(curHandler)
            parser.setDTDHandler(curHandler)
    
            stream = StringIO.StringIO(xml_string)
            parser.parse(stream)
            stream.close()
        except xml.sax.SAXParseException, e:
            print "*** PARSER error: %s" % e
    
    XML = """<!DOCTYPE test SYSTEM "test.dtd">
    <test summary='step: &num;'>Entity: &not;</test>
    """
    
    main(XML)
    

    test.dtd contains:

    <!ENTITY num "FOO">
    <!ENTITY pic SYSTEM 'bar.gif' NDATA gif>
    

    Output:

    TestHandler.resolveEntity(): None test.dtd
    TestHandler.unparsedEntityDecl(): None bar.gif
    TestHandler.startElement(): step: FOO
    TestHandler.skippedEntity(): not
    

    Addition

    As far as I can tell, skippedEntity is called only when an external DTD is used (at least I can't come up with a counterexample; it would be nice if the the documentation was a little clearer).

    Adam said in his answer that resolveEntity is called only for external DTDs. But that is not quite true. resolveEntity is also called when processing a reference to an external entity that is declared in an internal or external DTD subset. For example:

    <!DOCTYPE test [
    <!ENTITY num SYSTEM "bar.txt">
    ]>
    

    where the content of bar.txt could be, say, FOO. In this case it is not possible to refer to the entity in an attribute value.

    0 讨论(0)
  • 2021-01-16 10:31

    I think resolveEntity and skippedEntity are only called for external DTDs. I got this to work by modifying the XML.

    XML = """<?xml version="1.0" encoding="utf-8" ?>
    <!DOCTYPE test SYSTEM "external.dtd" >
    <test summary='step: &foo; &bar;'>Entity: &not;</test>
    """
    

    The external.dtd contains two simple entity declarations.

    <!ENTITY foo "bar">
    <!ENTITY bar "foo">
    

    Also, I got rid of resolveEntity.

    This outputs -

    TestHandler.startElement(), test : step: bar foo ()
    TestHandler.skippedEntity: not
    

    Hope this helps.

    0 讨论(0)
提交回复
热议问题