问题
I have the following simple regular expression:
\b\w+\b
Saxon reports the following error:
syntax error at char 2 in regular expression: Escape character 'b' not allowed
Does it mean I can't use word boundaries with Java Saxon parser? Is there an alternative free XML Java parser that has this functionality?
回答1:
The regular expression dialect used in XSD and XPath does not recognize \b (either as a word boundary or as a backspace). I think the reason for excluding it was probably a misplaced anxiety that word boundaries are language/culture dependent, though that's illogical since the dialect does support \w (a word character), and a word boundary can be simply defined as a boundary between a character that matches \w and a character that doesn't. Alternatively the XSD team may have been worried about the ambiguities that arise with zero-length matches, which are a notorious source of bugs and make it very hard to specify rigorously exactly what regular expressions do.
So it's not a Saxon limitation, it's a limitation written into the XPath specification.
If you're not too concerned about standards conformance, Saxon allows you to put "!" at the end of the "flags" argument to indicate that your regular expression is a Java regular expression rather than an XPath regular expression.
来源:https://stackoverflow.com/questions/25446314/in-saxon-9-he-java-xml-parser-word-boundaries-b-in-regular-expressions-are-n