Using Python's lxml library, I'm trying to load a .xsd as schema. The Python script is in one directory and the schemas are in another:
The problem is that schema_1.xsd
includes schema_2.xsd
like this:
<xsd:include schemaLocation="schema_2.xsd"/>
Being schema_2.xsd
a relative path (the two schemas are in the same directory), lxml doesn't find it and it rises and error:
schema_root = etree.fromstring(open('data/xsd/schema_1.xsd').read().encode('utf-8'))
schema = etree.XMLSchema(schema_root)
--> xml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}include': Failed to load the document './schema_2.xsd' for inclusion
How to solve this problem without changing the schema files?
One option is to use an XML Catalog. You could also probably use a custom URI Resolver, but I've always used a catalog. It's easier for non-developers to make configuration changes. This is especially helpful if you're delivering an executable instead of plain Python.
Using a catalog is different between Windows and Linux; see here for more info.
Here's a Windows example using Python 3.#.
XSD #1 (schema_1.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:include schemaLocation="schema_2.xsd"/>
<xs:element name="doc">
<xs:element ref="test"/>
<xs:element name="test" type="test"/>
XSD #2 (schema_2.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:simpleType name="test">
<xs:restriction base="xs:string">
<xs:enumeration value="Hello World"/>
XML Catalog (catalog.xml)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<!-- The path in @uri is relative to this file (catalog.xml). -->
<system systemId="schema_2.xsd" uri="./xsd_test/schema_2.xsd"/>
import os
from urllib.request import pathname2url
from lxml import etree
# The XML_CATALOG_FILES environment variable is used by libxml2 (which is used by lxml).
# See http://xmlsoft.org/catalog.html.
if "XML_CATALOG_FILES" not in os.environ:
# Path to catalog must be a url.
catalog_path = f"file:{pathname2url(os.path.join(os.getcwd(), 'catalog.xml'))}"
# Temporarily set the environment variable.
os.environ['XML_CATALOG_FILES'] = catalog_path
schema_root = etree.fromstring(open('xsd_test/schema_1.xsd').read().encode('utf-8'))
schema = etree.XMLSchema(schema_root)
Print Output
<lxml.etree.XMLSchema object at 0x02B4B3F0>