I have a set of jsonschema compliant documents. Some documents contain references to other documents (via the $ref
attribute). I do not wish to host these docum
Following up on the answer @chris-w provided, I wanted to do this same thing with jsonschema 3.2.0
but his answer didn't quite cover it I hope this answer helps those who are still coming to this question for help but are using a more recent version of the package.
To extend a JSON schema using the library, do the following:
base.schema.json
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
extend.schema.json
{
"allOf": [
{"$ref": "base.schema.json"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
data.json
{
"prop": "This is the property",
"extra": true
}
#Set up schema, resolver, and validator on the base schema
baseSchema = json.loads(baseSchemaJSON) # Create a schema dictionary from the base JSON file
relativeSchema = json.loads(relativeJSON) # Create a schema dictionary from the relative JSON file
resolver = RefResolver.from_schema(baseSchema) # Creates your resolver, uses the "$id" element
validator = Draft7Validator(relativeSchema, resolver=resolver) # Create a validator against the extended schema (but resolving to the base schema!)
# Check validation!
data = json.loads(dataJSON) # Create a dictionary from the data JSON file
validator.validate(data)
You may need to make a few adjustments to the above entries, such as not using the Draft7Validator. This should work for single-level references (children extending a base), you will need to be careful with your schemas and how you set up the RefResolver
and Validator
objects.
P.S. Here is a snipped that exercises the above. Try modifying the data
string to remove one of the required attributes:
import json
from jsonschema import RefResolver, Draft7Validator
base = """
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
"""
extend = """
{
"allOf": [
{"$ref": "base.schema.json"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
"""
data = """
{
"prop": "This is the property string",
"extra": true
}
"""
schema = json.loads(base)
extendedSchema = json.loads(extend)
resolver = RefResolver.from_schema(schema)
validator = Draft7Validator(extendedSchema, resolver=resolver)
jsonData = json.loads(data)
validator.validate(jsonData)
You must build a custom jsonschema.RefResolver
for each schema which uses a relative reference and ensure that your resolver knows where on the filesystem the given schema lives.
Such as...
import os
import json
from jsonschema import Draft4Validator, RefResolver # We prefer Draft7, but jsonschema 3.0 is still in alpha as of this writing
abs_path_to_schema = '/path/to/schema-doc-foobar.json'
with open(abs_path_to_schema, 'r') as fp:
schema = json.load(fp)
resolver = RefResolver(
# The key part is here where we build a custom RefResolver
# and tell it where *this* schema lives in the filesystem
# Note that `file:` is for unix systems
schema_path='file:{}'.format(abs_path_to_schema),
schema=schema
)
Draft4Validator.check_schema(schema) # Unnecessary but a good idea
validator = Draft4Validator(schema, resolver=resolver, format_checker=None)
# Then you can...
data_to_validate = `{...}`
validator.validate(data_to_validate)
Fixed a wrong reference (
$ref
) tobase
schema. Updated the example to use the one from the docs: https://json-schema.org/understanding-json-schema/structuring.html
This is just another version of @Daniel's answer -- which was the one correct for me. Basically, I decided to define the $schema
in a base schema. Which then release the other schemas and makes for a clear call when instantiating the resolver.
RefResolver.from_schema()
gets (1) some schema and also (2) a schema-store was not very clear to me whether the order and which "some" schema were relevant here. And so the structure you see below.I have the following:
base.schema.json
:
{
"$schema": "http://json-schema.org/draft-07/schema#"
}
definitions.schema.json
:
{
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
address.schema.json
:
{
"type": "object",
"properties": {
"billing_address": { "$ref": "definitions.schema.json#" },
"shipping_address": { "$ref": "definitions.schema.json#" }
}
}
I like this setup for two reasons:
Is a cleaner call on RefResolver.from_schema()
:
base = json.loads(open('base.schema.json').read())
definitions = json.loads(open('definitions.schema.json').read())
schema = json.loads(open('address.schema.json').read())
schema_store = {
base.get('$id','base.schema.json') : base,
definitions.get('$id','definitions.schema.json') : definitions,
schema.get('$id','address.schema.json') : schema,
}
resolver = RefResolver.from_schema(base, store=schema_store)
Then I profit from the handy tool the library provides give you the best validator_for
your schema (according to your $schema
key):
Validator = validator_for(base)
And then just put them together to instantiate validator
:
validator = Validator(schema, resolver=resolver)
Finally, you validate
your data:
data = { "shipping_address": { "street_address": "1600 Pennsylvania Avenue NW", "city": "Washington", "state": "DC" }, "billing_address": { "street_address": "1st Street SE", "city": "Washington", "state": 32 } }
"state": 32
:>>> validator.validate(data)
ValidationError: 32 is not of type 'string'
Failed validating 'type' in schema['properties']['billing_address']['properties']['state']:
{'type': 'string'}
On instance['billing_address']['state']:
32
Change that to
"DC"
, and will validate.
I had the hardest time figuring out how to do resolve against a set of schemas that $ref
each other (I am new to JSON Schemas). It turns out the key is to create the RefResolver
with a store
that is a dict
which maps from url to schema.
Building on @devin-p's answer:
import json
from jsonschema import RefResolver, Draft7Validator
base = """
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
"""
extend = """
{
"$id": "extend.schema.json",
"allOf": [
{"$ref": "base.schema.json#"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
"""
extend_extend = """
{
"$id": "extend_extend.schema.json",
"allOf": [
{"$ref": "extend.schema.json#"},
{
"properties": {
"extra2": {
"type": "boolean"
}
},
"required": ["extra2"]
}
]
}
"""
data = """
{
"prop": "This is the property string",
"extra": true,
"extra2": false
}
"""
schema = json.loads(base)
extendedSchema = json.loads(extend)
extendedExtendSchema = json.loads(extend_extend)
schema_store = {
schema['$id'] : schema,
extendedSchema['$id'] : extendedSchema,
extendedExtendSchema['$id'] : extendedExtendSchema,
}
resolver = RefResolver.from_schema(schema, store=schema_store)
validator = Draft7Validator(extendedExtendSchema, resolver=resolver)
jsonData = json.loads(data)
validator.validate(jsonData)
The above was built with jsonschema==3.2.0
.
My approach is to preload all schema fragments to RefResolver cache. I created a gist that illustrates this: https://gist.github.com/mrtj/d59812a981da17fbaa67b7de98ac3d4b