Unmatch complete words if a negative lookahead is satisfied

后端 未结 2 2000
慢半拍i
慢半拍i 2021-01-22 12:13

I need to match only those words which doesn\'t have special characters like @ and :. For example:

  1. git@github.com shouldn\'t
相关标签:
2条回答
  • 2021-01-22 12:33

    You can use negative lookbehind and negative lookahead patterns around a word pattern to make sure that the word is not preceded or followed by a non-space character, or in other words, to make sure that it is surrounded by either a space or a string boundary:

    (?<!\S)\w+(?!\S)
    

    Demo: https://regex101.com/r/cjhUUM/2

    0 讨论(0)
  • 2021-01-22 12:35

    You may add \w to the lookahead:

    \w+(?![\w@:])
    

    The equivalent is using a word boundary:

    \w+\b(?![@:])
    

    Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:

    ^\w+(?![\w@:])
    

    Or

    (?<!\S)\w+(?![\w@:])
    

    The ^ will match the word at the start of the string and (?<!S) will match only if the word is preceded with whitespace or start of string.

    See the regex demo.

    Why not (?<!\S)\w+(?!\S), the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S) negative lookahead would make the \w+ match only when it is followed with whitespace or at the end of the string.

    0 讨论(0)
提交回复
热议问题