negative lookahead regex on elasticsearch

元气小坏坏 提交于 2019-12-01 06:21:44
Wiktor Stribiżew

Solution

You can solve the issue with either of the two:

"value": "~(charge|encode|relate)night~(charge|encode|relate)",

or

.*night.*&~(.*(charge|encode|relate).*)

With an optional (since it is ON by default)

"flags" : "ALL"

How does it work?

In common NFA regular expressions, you usually have negative lookarounds that help restrict a more generic pattern (those that look like (?!...) or (?<!...)). However, in ElasticSearch, you need to use specific optional operators.

The ~ (tilde) is the complement that is *used to negate an atom right after it. An atom is either a single symbol or a group of subpatterns/alternatives inside a group.

NOTE that all ES patterns are anchored at the start and end of string by default, you never need to use ^ and $ common in Perl-like and .NET, and other NFAs.

Thus,

  • ~(charge|encode|relate) - matches any text from the start of the string other than charge, encode and relate
  • night - matches the word night
  • ~(charge|encode|relate) - matches any text other than either of the 3 substrings up to the end of string.

In an NFA regex like Perl, you could write that pattern using a tempered greedy token:

/^(?:(?!charge|encode|relate).)*night(?:(?!charge|encode|relate).)*$/

The second pattern is trickier: common NFA regexes usually do not jump from location to location when matching, thus, lookaheads anchored at the start of text are commonly used. Here, using an INTERSECTION we can just use 2 patterns, where one will be matching the string and the second one should also match the string.

  • .*night.* - match the whole line (as . matches any symbol but a newline, else, use (.|\n)*) with night in it
  • & - and
  • ~(.*(charge|encode|relate).*) - the line that does not have charge, encode and relate substrings in it.

An NFA Perl-like regex would look like

/^(?!.*(charge|encode|relate)).*night.*$/

You didn't use an anchor for your lookaheads. Try using "^" at the beginning of the pattern and it should work.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!