I need to match only those words which doesn\'t have special characters like @
and :
.
For example:
git@github.com
shouldn\'t
You may add \w
to the lookahead:
\w+(?![\w@:])
The equivalent is using a word boundary:
\w+\b(?![@:])
Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:
^\w+(?![\w@:])
Or
(?
The ^
will match the word at the start of the string and (? will match only if the word is preceded with whitespace or start of string.
See the regex demo.
Why not (?
, the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S)
negative lookahead would make the \w+
match only when it is followed with whitespace or at the end of the string.