问题
I want write an XSD to restrict the content of valid XML elements of type xsd:token such that at validation they would indistinguishable from the same content wrapped in xsd:string.
I.e. they do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, begin or end with a space (#x20) character, and do not include a sequence of two or more adjacent space characters.
I think the regular expression to use is this:
\S+( \S+)*
(some non-whitespace, optional [single spaces next to one or more non-whitespaces], including always non-whitespace to close out)
This works with various regex testing tools but I can't seem to check it using oXygen XML Editor; double spaces, leading and trailing spaces, tabs, and line breaks in the strings seem to allow the XML instance to still pass validation.
Here's the XSD implementation:
<xs:simpleType name="Tokenized500Type">
<xs:restriction base="xs:token">
<xs:maxLength value="500"/>
<xs:minLength value="1"/>
<xs:pattern value="\S+( \S+)*"/>
</xs:restriction>
</xs:simpleType>
Is there some feature of
- XML
or
- XSD
or
- oXygen XML Editor
that prevents this working?
回答1:
Your original ([^\s])+( [^\s]+)*([^\s])*
regex contains some redundant patterns: it matches and captures each iteration of 1+ non-whitespaces, then matches 0+ sequences of space and 1+ non-whitespaces, and then again tries to match and capture each iteration of a non-whitespace.
You may use a similar, but shorter
\S+( \S+)*
Since XML Schema regex is anchored by default, there expression matches:
\S+
- one or more chars other than whitespace, specifically
(space),\t
(tab),\n
(newline) and\r
(return)( \S+)*
- zero or more sequences of a space and 1+ whitespaces.
This expression disallows duplicate consecutive spaces and no spaces at leading/trailing position.
Here is how the regex should be used:
<xs:simpleType name="Tokenized500Type">
<xs:restriction base="xs:string">
<xs:pattern value="\S+( \S+)*"/>
<xs:maxLength value="500"/>
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
回答2:
The base type needs to be xsd:string.
Using xsd:Token tokenizes the input, THEN checks if it's a token. That is redundant.
来源:https://stackoverflow.com/questions/40346316/what-is-the-regular-expression-for-the-set-of-strings-that-validate-exactly-the