Unmatch complete words if a negative lookahead is satisfied

让人想犯罪 __ 提交于 2020-02-15 10:15:48

问题


I need to match only those words which doesn't have special characters like @ and :. For example:

  1. git@github.com shouldn't match
  2. list should return a valid match
  3. show should also return a valid match

I tried it using a negative lookahead \w+(?![@:])

But it matches gi out of git@github.com but it shouldn't match that too.


回答1:


You may add \w to the lookahead:

\w+(?![\w@:])

The equivalent is using a word boundary:

\w+\b(?![@:])

Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:

^\w+(?![\w@:])

Or

(?<!\S)\w+(?![\w@:])

The ^ will match the word at the start of the string and (?<!S) will match only if the word is preceded with whitespace or start of string.

See the regex demo.

Why not (?<!\S)\w+(?!\S), the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S) negative lookahead would make the \w+ match only when it is followed with whitespace or at the end of the string.




回答2:


You can use negative lookbehind and negative lookahead patterns around a word pattern to make sure that the word is not preceded or followed by a non-space character, or in other words, to make sure that it is surrounded by either a space or a string boundary:

(?<!\S)\w+(?!\S)

Demo: https://regex101.com/r/cjhUUM/2



来源:https://stackoverflow.com/questions/58475516/unmatch-complete-words-if-a-negative-lookahead-is-satisfied

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!