REGEX: Select KeyWord1 if KeyWord2 is in the same string

坚强是说给别人听的谎言 提交于 2020-06-25 22:03:51

问题


I am trying to capture KEYWORD1 in .NET regex engine based on whether KeyWord2 is present in the string. So far the positive look-around solution I am using:

(?=.*KeyWord2)**KEYWORD1** (\m\i)

RegEx Test Link

only captures KEYWORD1 if KeyWord2 is positioned anywhere behind KEYWORD1 in the string. How can I optimize this in regex so that it captures all instances of KEYWORD1 in the string despite the position of KeyWord2 being ahead, behind or both?

I'd really appreciate some insight.

Thank You


回答1:


You can use the regex below for your requirement:

\bKEYWORD1\b(?:(?<=\bKeyWord2\b.*?)|(?=.*?\bKeyWord2\b))

Explanation of the above Regular Expression:

gi - Use the flags(in order to avoid any case difference) representing: g - global; i - case-insensitive

\b - Represents a word boundary.

(?:) - Represents a non-capturing group.

(?=.*?KeyWord2) - Represents the positive lookahead which matches all KEYWORD1 which are before KeyWord2 read from left to right.

| - Represents alternation; that is it alternates between 1st and 2nd alternating group.(Although, you can wrap them in group.)

(?<=KeyWord2.*?) - Represents infinite(because non-fixed width lazy identifier .*? used) positive lookbehind which matches all KEYWORD1 which are behind of KeyWord2.

You can find the above regex demo here.

NOTE - For the record, these engines support infinite lookbehind:

  • .NET (C#, VB.NET etc.)

  • Matthew Barnett's regex module for Python

  • JGSoft (EditPad etc.; not available in a programming language).

  • ECMASCRIPT(Javascript)

As far as I know, they are the only ones.




回答2:


If one uses a regex engine that supports \G and \K, the following regular expression could be used.

^(?=.*\bKeyWord2\b)|\G.*?\K\bKEYWORD1\b

with the case-indifferent flag and, depending on requirements, multiline flag, set.

PCRE demo

With PCRE (PHP) and some other regex engines the anchor \G matches the end of previous match. For the first match attempt, \G is equivalent to \A, matching the start of the string. See this discussion for details.

\K resets the starting point of the reported match to the current position of the engine's internal string pointer. Any previously consumed characters are not included in the final match. In effect, \K causes the engine to "forget" everything matched up to that point. Details can be found here.

As shown at the link, there are four matches of the string

The KEYWORD1 before KeyWord2 then KEYWORD1 and KEYWORD1 again

They are an empty string at the beginning of the string and each of the three instances of KEYWORD1. In fact for every string matched one of the matches will be an empty string at the beginning of the string. Empty strings must therefore be disregarded when making substitutions.



来源:https://stackoverflow.com/questions/61831624/regex-select-keyword1-if-keyword2-is-in-the-same-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!