Unmatch complete words if a negative lookahead is satisfied

后端 未结 2 2003
慢半拍i
慢半拍i 2021-01-22 12:13

I need to match only those words which doesn\'t have special characters like @ and :. For example:

  1. git@github.com shouldn\'t
2条回答
  •  感情败类
    2021-01-22 12:35

    You may add \w to the lookahead:

    \w+(?![\w@:])
    

    The equivalent is using a word boundary:

    \w+\b(?![@:])
    

    Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:

    ^\w+(?![\w@:])
    

    Or

    (?

    The ^ will match the word at the start of the string and (? will match only if the word is preceded with whitespace or start of string.

    See the regex demo.

    Why not (?, the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S) negative lookahead would make the \w+ match only when it is followed with whitespace or at the end of the string.

提交回复
热议问题