问题
Maybe I've been staring at this problem for too long, maybe there isn't an answer; either way I'm here now.
I'm trying to permit a set of possible combinations in an XSD, but I can't seem to find an approach that doesn't result in ambiguity.
Quick regexy respresentation:
foo+ ( bar baz* | bar? baz+ qux* )
foo
is required (one-or-more)- If
bar
exists,baz
is optional (zero-or-more) - If
baz
exists,bar
is optional (zero-or-one) andqux
is optional (zero-or-more) qux
can not exist ifbaz
does not exist
Ambiguity arises given foo bar baz
.
Ambiguous XSD document:
<xs:element name="parent">
<xs:complexType>
<xs:sequence>
<xs:element name="foo" minOccurs="1" maxOccurs="unbounded" />
<xs:choice>
<xs:sequence>
<xs:element name="bar" minOccurs="1" maxOccurs="1" />
<xs:element name="baz" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:sequence>
<xs:element name="bar" minOccurs="0" maxOccurs="1" />
<xs:element name="baz" minOccurs="1" maxOccurs="unbounded" />
<xs:element name="qux" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
Screen capture for good measure:
Now, I'm beginning to realize that perhaps this is simply a constraint of the XSD content model. The reason for ambiguity is obvious; the solution not so.
Can anyone see a means with which I can permit this; by re-ordering the elements, through use of some schema design pattern to alleviate ambiguous scenarios like this?
The condition dependency of bar
and baz
is clearly the problem, but I can't think of any other way to do this.
Thanks so much in advance folks.
Edit: Currently reading "Schema Component Constraint: Unique Particle Attribution" in an attempt to find a loop-hole. Any other suggested reading welcome.
回答1:
IIRC there is a theorem in computer science that says every ambiguous grammar can be rewritten as an unambiguous grammar, so start with the hypothesis that it's possible. However, the unambiguous grammar can sometimes be hideously complex.
I think a good approach to handling this is to draw the "railroad diagram" of the grammar, that is, the finite state machine with its transitions. Then when you find a state in this machine that has two transitions labelled with the same symbol, you need to construct a new state that accepts both those transitions, and so on. In the CS literature this algorithm is called "determinization".
Another approach which is perhaps easier to explain without a whiteboard is to start by factoring out what is common between the two branches of your choice. When you hit the first element in the content, it has to be either a bar or a baz. So write two choices, one starting with bar and one with baz.
As far as I can see, your content model is euiqvalent to the unambiguous model
(bar, (baz+, qux*)?) | (baz+, qux*)
but I would check that carefully...
来源:https://stackoverflow.com/questions/10594643/content-model-ambiguity-in-a-schema