Using Python and lxml to validate XML against an external DTD

我们两清 提交于 2019-12-01 01:10:34

You need to add no_network=False when constructing the parser object. This option is set to True by default.

From the documentation of parser options at http://lxml.de/parsing.html#parsers:

no_network - prevent network access when looking up external documents (on by default)

For a reason I still don't know, my problem was related to where the XML catalog was located on my local file system.

In my case, I use an XML editor that has a tight integration with a component content management system (CCMS, in this case SDL Trisoft 2011 R2). When the editor connects to the CCMS, DTDs, catalog files and a bunch of other files are synced. These files end up on the local file system in:

C:\Users\[username]\AppData\Local\Trisoft\InfoShare Client\[id]\Config\DocTypes\catalog.xml

I could not get that to work. Simply COPYING the whole catalog to another location fixed things, and this works:

f = r"path/to/my/file.xml"
# set XML catatog file path
os.environ['XML_CATALOG_FILES'] = r'C:\DATA\Mydoctypes\catalog.xml'
# configure parser
parser = etree.XMLParser(dtd_validation=True, no_network=True)
# validate
try:
   valid = etree.parse(f, parser=parser)
    print("This file is valid against the DTD.")
except etree.XMLSyntaxError, error:
   print("This file is INVALID against the DTD!")
   print(error)

Obviously this is not ideal, but it works.

Could it be something to do with file permissions, or perhaps that good old "file path too long" problem in Windows? I have not tried whether a symbolic link would work.

I am using Windows 7, Python 2.7.11 and the version of lxml is (3.6.0).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!