Regular expressions negative lookahead

▼魔方 西西 提交于 2019-12-04 04:20:18

I love regex gymnastics! Here is a commented PHP regex:

$re = '/# Find all AS, (but not preceding a XX == null).
    \bas\b               # Match "as"
    (?=                  # But only if...
      (?:                # there exist from 1-150
        [\S\s]           # chars, each of which
        (?!==\s*null)    # are NOT preceding "=NULL"
      ){1,150}?          # (and do this lazily)
      (?:                # We are done when either
        (?=              # we have reached
          ==\s*(?!null)  # a non NULL conditional
        )                #
      | $                # or the end of string.
      )
    )/ix'

And here it is in Javascript style:

re = /\bas\b(?=(?:[\S\s](?!==\s*null)){1,150}?(?:(?===\s*(?!null))|$))/ig;

This one did make my head hurt a little...

Here is the test data I am using:

text = r"""    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1.a == y1.a)

however, not capture
    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1 == null)

nor for that matter
    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(somethingunrelated == null) {...}
    if(x1.a == y1.a)"""

Put the .{1,150} inside the lookahead, and replace . with \s\S (in general, . doesn't match newlines). Also, the \b might be misleading near the ==.

\bas\b(?![\s\S]{1,150}==\s*null\b)

I think it would help to put the variable name into () so you can use it as a back reference. Something like the following,

\b(\w+)\b\W*=\W*\w*\W*\bas\b[\s\S]{1,150}(?!\b\1\b\W*==\W*\bnull\b)

The question isn't clear. What do you want EXACTLY ? I regret, but I still don't understand, after having read the question and comments numerous times.

.

Must the code be in C# ? In Python ? Other ? There is no indication concerning this point

.

Do you want a matching only if a if(... == ...) line follows a block of var ... = ... lines ?

Or may an heterogenous line be BETWEEN the block and the if(... == ...) line without stopping the matching ?

My code takes the second option as true.

.

Does a if(... == null) line AFTER a if(... == ...) line stop the matchin or not ?

Unable to understand if it is yes or no, I defined the two regexes to catch these two options.

.

I hope my code will be clear enough and answering to your preoccupation.

It is in Python

import re

ch1 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
1618987987849891
'''

ch2 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
3213546878'''

ch3='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
165478964654456454'''

ch4='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
hgyrtdduihudgug
if(x1 == null)
165489746+54646544'''

ch5='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
1354687897'''

ch6='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
ifughobviudyhogiuvyhoiuhoiv
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
2468748874897498749874897'''

ch7 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

ch8 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

ch9 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

pat1 = re.compile(('('
                   '(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
                   '([\s\S](?!==\s*null\\b))*?'
                   '^if *\( *[^\s=]+ *==(?!\s*null).+$'
                   ')'
                   ),
                  re.MULTILINE)

pat2 = re.compile(('('
                   '(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
                   '([\s\S](?!==\s*null\\b))*?'
                   '^if *\( *[^\s=]+ *==(?!\s*null).+$'
                   ')'
                   '(?![\s\S]{0,150}==)'
                   ),
                  re.MULTILINE)


for ch in (ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9):
    print pat1.search(ch).group() if pat1.search(ch) else pat1.search(ch)
    print
    print pat2.search(ch).group() if pat2.search(ch) else pat2.search(ch)
    print '-----------------------------------------'

Result

>>> 
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)

var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)

var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)

None
-----------------------------------------
>>> 

Let me try to redefine your problem:

  1. Look for an "as" assignment -- you probably needs a better regex to look for actual assignments and may want to store the expression assigned, but let's use "\bas\b" for now
  2. If you see an if (... == null) within 150 characters, don't match
  3. If you don't see an if (... == null) within 150 characters, match

Your expression \bas\b.{1,150}(?!\b==\s*null\b) won't work because of the negative look-ahead. The regex can always skip ahead or behind one letter in order to avoid this negative look-ahead and you end up matching even when there is an if (... == null) there.

Regex's are really not good at not matching something. In this case, you're better of trying to match an "as" assignment with an "if == null" check within 150 characters:

\bas\b.{1,150}\b==\s*null\b

and then negating the check: if (!regex.match(text)) ...

(?s:\s+as\s+(?!.{0,150}==\s*null\b))

I'm activating the SingleLine option with ?s:. You can put it in the options of your Regex if you want. I'll add that I'm putting \s around as because I think that only spaces are "legal" around the as. You can probably put the \b like

(?s:\b+as\b(?!.{0,150}==\s*null\b))

Be aware that \s will probably catch spaces that aren't "valid spaces". It's defined as [\f\n\r\t\v\x85\p{Z}] where \p{Z} is Unicode Characters in the 'Separator, Space' Category plus Unicode Characters in the 'Separator, Line' Category plus Unicode Characters in the 'Separator, Paragraph' Category.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!