In Saxon 9 he Java XML parser, word boundaries (\b) in regular expressions are not recognized

扶醉桌前 提交于 2019-12-19 05:01:08

问题


I have the following simple regular expression:

\b\w+\b

Saxon reports the following error:

syntax error at char 2 in regular expression: Escape character 'b' not allowed

Does it mean I can't use word boundaries with Java Saxon parser? Is there an alternative free XML Java parser that has this functionality?


回答1:


The regular expression dialect used in XSD and XPath does not recognize \b (either as a word boundary or as a backspace). I think the reason for excluding it was probably a misplaced anxiety that word boundaries are language/culture dependent, though that's illogical since the dialect does support \w (a word character), and a word boundary can be simply defined as a boundary between a character that matches \w and a character that doesn't. Alternatively the XSD team may have been worried about the ambiguities that arise with zero-length matches, which are a notorious source of bugs and make it very hard to specify rigorously exactly what regular expressions do.

So it's not a Saxon limitation, it's a limitation written into the XPath specification.

If you're not too concerned about standards conformance, Saxon allows you to put "!" at the end of the "flags" argument to indicate that your regular expression is a Java regular expression rather than an XPath regular expression.



来源:https://stackoverflow.com/questions/25446314/in-saxon-9-he-java-xml-parser-word-boundaries-b-in-regular-expressions-are-n

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!