XML (.xsd) feed validation against a schema

前端 未结 2 1080
后悔当初
后悔当初 2020-12-24 03:23

I have a XML file and I have a XML schema. I want to validate the file against that schema and check if it adheres to that. I am using python but am open to any language for

相关标签:
2条回答
  • 2020-12-24 03:37

    Definitely lxml.

    Define an XMLParser with a predefined schema, load the the file fromstring() and catch any XML Schema errors:

    from lxml import etree
    
    def validate(xmlparser, xmlfilename):
        try:
            with open(xmlfilename, 'r') as f:
                etree.fromstring(f.read(), xmlparser) 
            return True
        except etree.XMLSchemaError:
            return False
    
    schema_file = 'schema.xsd'
    with open(schema_file, 'r') as f:
        schema_root = etree.XML(f.read())
    
    schema = etree.XMLSchema(schema_root)
    xmlparser = etree.XMLParser(schema=schema)
    
    filenames = ['input1.xml', 'input2.xml', 'input3.xml']
    for filename in filenames:
        if validate(xmlparser, filename):
            print("%s validates" % filename)
        else:
            print("%s doesn't validate" % filename)
    

    Note about encoding

    If the schema file contains an xml tag with an encoding (e.g. <?xml version="1.0" encoding="UTF-8"?>), the code above will generate the following error:

    Traceback (most recent call last):
      File "<input>", line 2, in <module>
        schema_root = etree.XML(f.read())
      File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML
      File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument
    ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
    

    A solution is to open the files in byte mode: open(..., 'rb')

    [...]
    def validate(xmlparser, xmlfilename):
        try:
            with open(xmlfilename, 'rb') as f:
    [...]
    with open(schema_file, 'rb') as f:
    [...]
    
    0 讨论(0)
  • 2020-12-24 03:44

    The python snippet is good, but an alternative is to use xmllint:

    xmllint -schema sample.xsd --noout sample.xml
    
    0 讨论(0)
提交回复
热议问题