One of the benefits of XML is being able to validate a document against an XSD. YAML doesn\'t have this feature, so how can I validate that the YAML document I open is in th
I find Cerberus to be very reliable with great documentation and straightforward to use.
Here is a basic implementation example:
my_yaml.yaml
:
name: 'my_name'
date: 2017-10-01
metrics:
percentage:
value: 87
trend: stable
Defining the validation schema in schema.py
:
{
'name': {
'required': True,
'type': 'string'
},
'date': {
'required': True,
'type': 'date'
},
'metrics': {
'required': True,
'type': 'dict',
'schema': {
'percentage': {
'required': True,
'type': 'dict',
'schema': {
'value': {
'required': True,
'type': 'number',
'min': 0,
'max': 100
},
'trend': {
'type': 'string',
'nullable': True,
'regex': '^(?i)(down|equal|up)$'
}
}
}
}
}
}
Using the PyYaml to load a yaml
document:
import yaml
def load_doc():
with open('./my_yaml.yaml', 'r') as stream:
try:
return yaml.load(stream)
except yaml.YAMLError as exception:
raise exception
## Now, validating the yaml file is straightforward:
from cerberus import Validator
schema = eval(open('./schema.py', 'r').read())
v = Validator(schema)
doc = load_doc()
print(v.validate(doc, schema))
print(v.errors)
Keep in mind that Cerberus is an agnostic data validation tool, which means that it can support formats other than YAML, such as JSON, XML and so on.
You can use python's yaml lib to display message/char/line/file of your loaded file.
#!/usr/bin/env python
import yaml
with open("example.yaml", 'r') as stream:
try:
print(yaml.load(stream))
except yaml.YAMLError as exc:
print(exc)
The error message can be accessed via exc.problem
Access exc.problem_mark
to get a <yaml.error.Mark>
object.
This object allows you to access attributes
Hence you can create your own pointer to the issue:
pm = exc.problem_mark
print("Your file {} has an issue on line {} at position {}".format(pm.name, pm.line, pm.column))
Given that JSON and YAML are pretty similar beasts, you could make use of JSON-Schema to validate a sizable subset of YAML. Here's a code snippet (you'll need PyYAML and jsonschema installed):
from jsonschema import validate
import yaml
schema = """
type: object
properties:
testing:
type: array
items:
enum:
- this
- is
- a
- test
"""
good_instance = """
testing: ['this', 'is', 'a', 'test']
"""
validate(yaml.load(good_instance), yaml.load(schema)) # passes
# Now let's try a bad instance...
bad_instance = """
testing: ['this', 'is', 'a', 'bad', 'test']
"""
validate(yaml.load(bad_instance), yaml.load(schema))
# Fails with:
# ValidationError: 'bad' is not one of ['this', 'is', 'a', 'test']
#
# Failed validating 'enum' in schema['properties']['testing']['items']:
# {'enum': ['this', 'is', 'a', 'test']}
#
# On instance['testing'][3]:
# 'bad'
One problem with this is that if your schema spans multiple files and you use "$ref"
to reference the other files then those other files will need to be JSON, I think. But there are probably ways around that. In my own project, I'm playing with specifying the schema using JSON files whilst the instances are YAML.
I'm not aware of a python solution. But there is a ruby schema validator for YAML called kwalify. You should be able to access it using subprocess if you don't come across a python library.
These look good. The yaml parser can handle the syntax erorrs, and one of these libraries can validate the data structures.
You can load YAML document as a dict and use library schema to check it:
from schema import Schema, And, Use, Optional, SchemaError
import yaml
schema = Schema(
{
'created': And(datetime.datetime),
'author': And(str),
'email': And(str),
'description': And(str),
Optional('tags'): And(str, lambda s: len(s) >= 0),
'setup': And(list),
'steps': And(list, lambda steps: all('=>' in s for s in steps), error='Steps should be array of string '
'and contain "=>" to separate'
'actions and expectations'),
'teardown': And(list)
}
)
with open(filepath) as f:
data = yaml.load(f)
try:
schema.validate(data)
except SchemaError as e:
print(e)