问题
I'm having issues using word boundaries \b
in my regular expression. I'm using R but the issue exists as well when I try http://regexr.com. The pattern I'm using is \bs\.l\.\b
, and while I expected lines 1 and 3 below to match this pattern, only line 2 matches:
aaa s.l. bbb
aaa s.l.bbb
aaa s.l., bbb
See http://regexr.com/3f154 as well.
回答1:
The word boundaries match in the following positions:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
Now, you want to match s.l.
that is preceded with a word boundary, and not followed with a word char. You need to replace the trailing \b
with a (?!\w)
lookaround:
\bs\.l\.(?!\w)
See the regex demo
Use perl=TRUE
if you are using base R functions, and it will work as is in stringr functions powered with ICU regex library.
回答2:
.
is not a word character, so there is no word boundary between the .
characters and the space or comma.
来源:https://stackoverflow.com/questions/41537513/word-boundary-regex-issue