I want my XSD to validate the contents of a string. To be specific, I want to validate that a certain string does not occur.
Consider this rule, which
This is simpler to do in XSD 1.1, where you can use assertions to ensure that the value does not begin with the string you specify. But conceptually, it's simple enough even in XSD 1.0 and simple regular expressions: you want to ensure that the string does not begin with "/site/example.com
". If it did begin that way, you'd have a logical conjunction of a series of facts about the string:
You want to negate this conjunction of facts. Now, by De Morgan's Laws, ~(a and b and ... and z) is equivalent to (~a or ~b or ... or ~z). So you can do what you need by writing a disjunction of the following terms:
[^/].*
|.([^s].*)?
|.{2}([^i].*)?
|.{3}([^t].*)?
|.{4}([^e].*)?
|.{5}([^/].*)?
|.{6}([^e].*)?
|.{7}([^x].*)?
|.{8}([^a].*)?
|.{9}([^m].*)?
|.{10}([^p].*)?
|.{11}([^l].*)?
|.{12}([^e].*)?
|.{13}([^\.].*)?
|.{14}([^c].*)?
|.{15}([^o].*)?
|.{16}([^m].*)?
In each term above the subexpression of the form [^s].*
has been wrapped in (...)?
-- the term .{2}([^i].*)?
means any string beginning with two characters is OK if the third character is not an i
or if there is no third character at all. This ensures that strings shorter than 17 characters in length are not excluded, even if they happen to be prefixes of the forbidden string.
Of course, to use this in an XSD schema document, you will need to remove all the whitespace, which makes the regex harder to read.
[Addition, June 2016] See also this related and more general question.