.NET Regex Negative Lookahead - what am I doing wrong?

前端未结

关注

 1  1090

一向 2021-01-22 17:40

Assuming I have:

StartTest
  NoInclude
EndTest

StartTest
  Include
EndTest

and am using:

/StartTest(?!NoInclude)[\\s\\S]*?EndT


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   囚心锁ツ
                                             
                
                
                (楼主)
            
              
              
                2021-01-22 17:55
              

            
            
                        
You fail the match with the lookahead if NoInclude appears straight after StartTest. You need a tempered greedy token:

(?s)StartTest(?:(?!(?:Start|End)Test|NoInclude).)*EndTest
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


See the regex demo

The regex is matching StartTest, then matches any text that is not StartTest, EndTest or NoInclude, up to the EndTest.

Since the * is greedy, it will make the . match as much as it can. The negative lookahead will make it stop matching at the locations that are followed with the following alternatives:


(?:Start|End)Test - StartTest or EndTest
NoInclude - just NoInclude.


NOTE: The (?s) is an inline modifier (equivalent of RegexOptions.Singleline flag) that modifies the . behavior in a pattern making it match LF (newlines), too. Without this modifier (or without RegexOptions.Singleline) a dot matches any character but a newline.

NOTE2: If you are testing a regex outside of the native code environment, make sure you are using an appropriate tester for your regex flavor. regexr.com only supports JavaScript flavor, regex101.com supports JS, PCRE and Python flavors, and RegexStorm.net/RegexHero.net support .NET flavor. There are many more testers around, read what they support and what not first.

Here is a C# demo:

using System;
using System.IO;
using System.Text.RegularExpressions;
using System.Linq;
public class Test
{
    public static void Main()
    {
        var input = "StartTest\n  NoInclude\nEndTest\n\nStartTest\n  Include\nEndTest";
        var regex = new Regex(@"(?s)StartTest(?:(?!(?:Start|End)Test|NoInclude).)*EndTest");
        var results = regex.Matches(input).Cast()
                       .Select(p => p.Value)
                       .ToList();
        Console.WriteLine(string.Join("\n", results));
    }
}

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复