Using Python and lxml to validate XML against an external DTD

前端 未结 2 745
闹比i
闹比i 2021-01-06 14:24

I\'m trying to validate an XML file against an external DTD referenced in the doctype tag. Specifically:



        
相关标签:
2条回答
  • 2021-01-06 14:46

    You need to add no_network=False when constructing the parser object. This option is set to True by default.

    From the documentation of parser options at http://lxml.de/parsing.html#parsers:

    no_network - prevent network access when looking up external documents (on by default)

    0 讨论(0)
  • 2021-01-06 14:50

    For a reason I still don't know, my problem was related to where the XML catalog was located on my local file system.

    In my case, I use an XML editor that has a tight integration with a component content management system (CCMS, in this case SDL Trisoft 2011 R2). When the editor connects to the CCMS, DTDs, catalog files and a bunch of other files are synced. These files end up on the local file system in:

    C:\Users\[username]\AppData\Local\Trisoft\InfoShare Client\[id]\Config\DocTypes\catalog.xml
    

    I could not get that to work. Simply COPYING the whole catalog to another location fixed things, and this works:

    f = r"path/to/my/file.xml"
    # set XML catatog file path
    os.environ['XML_CATALOG_FILES'] = r'C:\DATA\Mydoctypes\catalog.xml'
    # configure parser
    parser = etree.XMLParser(dtd_validation=True, no_network=True)
    # validate
    try:
       valid = etree.parse(f, parser=parser)
        print("This file is valid against the DTD.")
    except etree.XMLSyntaxError, error:
       print("This file is INVALID against the DTD!")
       print(error)
    

    Obviously this is not ideal, but it works.

    Could it be something to do with file permissions, or perhaps that good old "file path too long" problem in Windows? I have not tried whether a symbolic link would work.

    I am using Windows 7, Python 2.7.11 and the version of lxml is (3.6.0).

    0 讨论(0)
提交回复
热议问题