RegExp: Last occurence of pattern that occurs before another pattern

前端未结

关注

 1  764

I want to take a text pattern that occurs the last before another text pattern.

For example I have this text:

code 4ab6-7b5
Another lorem ipsum
Rando


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2021-01-16 01:33
              
            
            
                                                                       
If your regex flavor supports lookaheads, you can use a solution like this

^code:[ ]([0-9a-f-]+)(?:(?!^code:[ ])[\s\S])*id-x


And you can find your result in capture number 1.

How does it work?

^code:[ ]           # match "code: " at the beginning of a line, the square 
                    # brackets are just to aid readability. I recommend always
                    # using them for literal spaces.

(                   # capturing group 1, your key
  [0-9a-f-]+        # match one or more hex-digits or hyphens
)                   # end of group 1

(?:                 # start a non-capturing group; each "instance" of this group
                    # will match a single arbitrary character that does not start
                    # a new "code: " (hence this cannot go beyond the current
                    # block)

  (?!               # negative lookahead; this does not consume any characters,
                    # but causes the pattern to fail, if its subpattern could
                    # match here

    ^code:[ ]       # match the beginning of a new block (i.e. "code: " at the
                    # beginning of another line

  )                 # end of negative lookahead, if we've reached the beginning
                    # of a new block, this will cause the non-capturing group to
                    # fail. otherwise just ignore this.

  [\s\S]            # match one arbitrary character
)*                  # end of non-capturing group, repeat 0 or more times
id-x                # match "id-x" literally


The (?:(?!stopword)[\s\S])* pattern let's you match as much as possible without going beyond another occurrence of stopword.

Note that you might have to use some form of multi-line mode for ^ to match at the beginning of a line. The ^ is important to avoid false negatives, if your random text contains open:.

Working demo (using Ruby's regex flavor, as I'm not sure which one you are ultimately going to use)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复