How to verify that one XSD schema is a subset of another XSD schema?

后端 未结 4 1525
轮回少年
轮回少年 2021-02-04 15:43

How can I verify that one XSD schema is a subset of another XSD schema?

We are creating a system-of-systems application using a collection of \"blueprint\" XSD schemas (

4条回答
  •  执念已碎
    2021-02-04 16:25

    The simplest way to ensure the relationship you want is to derive the types of the subset schemas by restriction from the types of the blueprint schema. It sounds as if that boat has already sailed, though.

    Like others here, I am not aware of any tools that do this out of the box (although if Petru Gardea says QT Assistant can, it's worth following up).

    One complication is that there are two different ways to view the subset/superset relation you want to verify: (1) every document (or element) accepted as valid by schema 1 is also accepted as valid by schema 2 (without reference to the type assignments made), or (2) the typed documents produced by validation (in what the spec calls the post-schema-validation infoset) against schemas 1 and 2 stand in an appropriate relation to each other: if an element or attribute is valid in tree 1, it's valid in tree 2; the type assigned to it in tree 1 is a restriction of the type assigned to it in tree 2; etc. If schemas 1 and 2 were developed independently, the chances that their types are related by derivation are poor, so I guess you have the first approach to the question in mind.

    The problem, though, is definitely decidable, in either form. For any schema (I'm using the term carefully) there are by definition a finite number of types and a finite number of element names declared; it follows that there is a finite number (possibly large) of element name / type pairs.

    The algorithm can go something like this.

    1. Start with the expected root element. (If there are multiple possible root elements, then in the general case you'll need to run this check for each of them.) If the expected root element is E, with type T1 in schema 1 and type T2 in schema 2, then place the task "Compare type T1 and T2" in a queue of open tasks. The list of tasks already completed will be empty.

    2. To compare two complex types T1 and T2:

      • Check the sets of attributes declared for T1 and T2 for a subset/superset relation between their names. Make sure no attribute required in the intended superset is absent or optional in the intended subset.

      • Each attribute A declared for both T1 and T2 will be assigned a type (call them ST1 and ST2). If ST1 = ST2, do nothing; otherwise, add the task "Compare simple types ST1 and ST2" to the queue of open tasks, unless it's on the list of comparisons already completed.

      • Now check the sequences of children that are possible in T1 and T2 -- as 13ren suggests in a comment, this is tractable since content models are essentially regular expressions which use the set of element names as their alphabet; the languages they define are therefore regular, and the subset/superset relation is decidable for regular languages.

      • Each possible child element C is assigned both an element declaration and a type definition by the parent types T1 and T2. Let us call them ED1, ED2, CT1, and CT2. Every child of the same name will have the same type, but different children may match different element declarations. So for any possible name, there will be just one pair of types CT1 and CT2, but there may be multiple pairs ED1 and ED2 (and the analysis will need to be careful to make sure they are matched up correctly; that might be hard to automate).

      • If CT1 = CT2, do nothing, otherwise put "Compare types CT1 and CT2" onto the open task queue, unless the comparison has already been performed.

      • If ED1 and ED2 are structurally identical, do nothing; otherwise put the task of comparing them into the task queue (unless it's already been done).

    3. To compare two simple types ST1 and ST2, compare either their lexical spaces (if you want the first definition of the subset/superset relation on schemas) or their value spaces (if you want the second). If ST1 and ST2 are both restrictions of the same primitive type, you may be able to compare the set of effective facet-based restrictions on them easily. The pattern facet may complicate matters, but because it defines a set of regular expressions, the subset/superset relation is decidable for it.

    4. To compare two element declarations, you need to compare each of the properties for the element declaration and check for the desired subset/superset relation.

    As you can see, it's complex and tedious enough that you really want to automate this analysis, and it's also complex enough that it's easy to see why it's not widely offered as out-of-the-box function. But it would certainly be interesting to code.

提交回复
热议问题