Validating a yaml document in python

前端 未结 10 2151
谎友^
谎友^ 2020-12-24 04:24

One of the benefits of XML is being able to validate a document against an XSD. YAML doesn\'t have this feature, so how can I validate that the YAML document I open is in th

相关标签:
10条回答
  • 2020-12-24 05:03

    I find Cerberus to be very reliable with great documentation and straightforward to use.

    Here is a basic implementation example:

    my_yaml.yaml:

    name: 'my_name'
    date: 2017-10-01
    metrics:
        percentage:
        value: 87
        trend: stable
    

    Defining the validation schema in schema.py:

    {
        'name': {
            'required': True,
            'type': 'string'
        },
        'date': {
            'required': True,
            'type': 'date'
        },
        'metrics': {
            'required': True,
            'type': 'dict',
            'schema': {
                'percentage': {
                    'required': True,
                    'type': 'dict',
                    'schema': {
                        'value': {
                            'required': True,
                            'type': 'number',
                            'min': 0,
                            'max': 100
                        },
                        'trend': {
                            'type': 'string',
                            'nullable': True,
                            'regex': '^(?i)(down|equal|up)$'
                        }
                    }
                }
            }
        }
    }
    

    Using the PyYaml to load a yaml document:

    import yaml
    def load_doc():
        with open('./my_yaml.yaml', 'r') as stream:
            try:
                return yaml.load(stream)
            except yaml.YAMLError as exception:
                raise exception
    
    ## Now, validating the yaml file is straightforward:
    from cerberus import Validator
    schema = eval(open('./schema.py', 'r').read())
        v = Validator(schema)
        doc = load_doc()
        print(v.validate(doc, schema))
        print(v.errors)
    

    Keep in mind that Cerberus is an agnostic data validation tool, which means that it can support formats other than YAML, such as JSON, XML and so on.

    0 讨论(0)
  • You can use python's yaml lib to display message/char/line/file of your loaded file.

    #!/usr/bin/env python
    
    import yaml
    
    with open("example.yaml", 'r') as stream:
        try:
            print(yaml.load(stream))
        except yaml.YAMLError as exc:
            print(exc)
    

    The error message can be accessed via exc.problem

    Access exc.problem_mark to get a <yaml.error.Mark> object.

    This object allows you to access attributes

    • name
    • column
    • line

    Hence you can create your own pointer to the issue:

    pm = exc.problem_mark
    print("Your file {} has an issue on line {} at position {}".format(pm.name, pm.line, pm.column))
    
    0 讨论(0)
  • 2020-12-24 05:06

    Given that JSON and YAML are pretty similar beasts, you could make use of JSON-Schema to validate a sizable subset of YAML. Here's a code snippet (you'll need PyYAML and jsonschema installed):

    from jsonschema import validate
    import yaml
    
    schema = """
    type: object
    properties:
      testing:
        type: array
        items:
          enum:
            - this
            - is
            - a
            - test
    """
    
    good_instance = """
    testing: ['this', 'is', 'a', 'test']
    """
    
    validate(yaml.load(good_instance), yaml.load(schema)) # passes
    
    # Now let's try a bad instance...
    
    bad_instance = """
    testing: ['this', 'is', 'a', 'bad', 'test']
    """
    
    validate(yaml.load(bad_instance), yaml.load(schema))
    
    # Fails with:
    # ValidationError: 'bad' is not one of ['this', 'is', 'a', 'test']
    #
    # Failed validating 'enum' in schema['properties']['testing']['items']:
    #     {'enum': ['this', 'is', 'a', 'test']}
    #
    # On instance['testing'][3]:
    #     'bad'
    

    One problem with this is that if your schema spans multiple files and you use "$ref" to reference the other files then those other files will need to be JSON, I think. But there are probably ways around that. In my own project, I'm playing with specifying the schema using JSON files whilst the instances are YAML.

    0 讨论(0)
  • 2020-12-24 05:10

    I'm not aware of a python solution. But there is a ruby schema validator for YAML called kwalify. You should be able to access it using subprocess if you don't come across a python library.

    0 讨论(0)
  • 2020-12-24 05:11

    These look good. The yaml parser can handle the syntax erorrs, and one of these libraries can validate the data structures.

    • http://pypi.python.org/pypi/voluptuous/ (I've tried this one, it is decent, if a bit sparse.)
    • http://discorporate.us/projects/flatland/ (not clear how to validate files at first glance)
    0 讨论(0)
  • 2020-12-24 05:12

    You can load YAML document as a dict and use library schema to check it:

    from schema import Schema, And, Use, Optional, SchemaError
    import yaml
    
    schema = Schema(
            {
                'created': And(datetime.datetime),
                'author': And(str),
                'email': And(str),
                'description': And(str),
                Optional('tags'): And(str, lambda s: len(s) >= 0),
                'setup': And(list),
                'steps': And(list, lambda steps: all('=>' in s for s in steps), error='Steps should be array of string '
                                                                                      'and contain "=>" to separate'
                                                                                      'actions and expectations'),
                'teardown': And(list)
            }
        )
    
    with open(filepath) as f:
       data = yaml.load(f)
       try:
           schema.validate(data)
       except SchemaError as e:
           print(e)
    
    0 讨论(0)
提交回复
热议问题