问题
I need to match only those words which doesn't have special characters like @
and :
.
For example:
git@github.com
shouldn't matchlist
should return a valid matchshow
should also return a valid match
I tried it using a negative lookahead \w+(?![@:])
But it matches gi
out of git@github.com
but it shouldn't match that too.
回答1:
You may add \w
to the lookahead:
\w+(?![\w@:])
The equivalent is using a word boundary:
\w+\b(?![@:])
Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:
^\w+(?![\w@:])
Or
(?<!\S)\w+(?![\w@:])
The ^
will match the word at the start of the string and (?<!S)
will match only if the word is preceded with whitespace or start of string.
See the regex demo.
Why not (?<!\S)\w+(?!\S)
, the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S)
negative lookahead would make the \w+
match only when it is followed with whitespace or at the end of the string.
回答2:
You can use negative lookbehind and negative lookahead patterns around a word pattern to make sure that the word is not preceded or followed by a non-space character, or in other words, to make sure that it is surrounded by either a space or a string boundary:
(?<!\S)\w+(?!\S)
Demo: https://regex101.com/r/cjhUUM/2
来源:https://stackoverflow.com/questions/58475516/unmatch-complete-words-if-a-negative-lookahead-is-satisfied