python regular expression “\1”

前端未结

关注

 5  1497

Can anyone tell me what does \"\\1\" mean in the following regular expression in Python?

re.sub(r\'(\\b[a-z]+) \\1\', r\'\\1\', \'cat in the the hat\')


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2020-12-24 01:52
              
            
            
                                                                       
Example

The following code using Python regex to find the repeating digits in given string

import re

result = re.search(r'(\d)\1{3}','54222267890' )

print result.group()

This gives the output



2222
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2020-12-24 01:59
              
            
            
                                                                       
The first \1 means the first group - i.e. the first bracketed expression (\b[a-z]+)

From the docs \number

"Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55 55', but not 'thethe' (note the space after the group)"

In your case it is looking for a repeated "word" (well, block of lower case letters).

The second \1 is the replacement  to use in case of a match, so a repeated word will be replaced by a single word.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2020-12-24 02:02
              
            
            
                                                                       
\1 is a backreference.
It matches, what ever matched in your brackets, in this case the

You are basically saying


match empty string at the beginning of a word (\b)
match alphabetical characters from a-z, one or more times
match the term in brackets again


cat in (' ''the')' the' hat
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2020-12-24 02:07
              
            
            
                                                                       
From the python docs for the re module:


  \number
  
  Matches the contents of the group of the same number. Groups are
  numbered starting from 1. For example, (.+) \1 matches 'the the' or
  '55 55', but not 'thethe' (note the space after the group). This
  special sequence can only be used to match one of the first 99 groups.
  If the first digit of number is 0, or number is 3 octal digits long,
  it will not be interpreted as a group match, but as the character with
  octal value number. Inside the '[' and ']' of a character class, all
  numeric escapes are treated as characters.


Your example is basically the same as what is explained in the docs.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2020-12-24 02:15
              
            
            
                                                                       
\1 is equivalent to re.search(...).group(1), the first parentheses-delimited expression inside of the regex.

It's also, fun fact, part of the reason that regular expressions are significantly slower in Python and other programming languages than required to be by CS theory.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复