Backslash escape sequences and word boundaries in Python regex

后端未结

关注

 1  578

Currently using re.sub(re.escape(\"andrew)\"), \"SUB\", stringVar)

Intended behavior:

stringVar = \" andrew) \"
re.sub(re.escape(\"andr


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2021-01-16 00:54
              
            
            
                                                                       
From python re module docs


  \b
  
  Matches the empty string, but only at the beginning or end of a word. 
  A word is defined as a sequence of alphanumeric or underscore characters, 
  so the end of a word is indicated by whitespace or a non-alphanumeric, 
  non-underscore character. Note that formally, \b is defined as the
  boundary between a \w and a \W character (or vice versa), or between \w 
  and the beginning/end of the string, so the precise set of characters 
  deemed to be alphanumeric depends on the values of the UNICODE and 
  LOCALE flags. For example, r'\bfoo\b' matches 'foo', 'foo.', '(foo)',
  'bar 
  foo baz' but not 'foobar' or 'foo3'.


In you case the word boundary is recognized as between andrew and ) which is the first non-alphanumeric non-underscore character. The example below illustrates what happens if you include or exclude ')' from the escape.

>>> stringVar = " andrew) "
>>> re.sub(r'\b%s\b' % re.escape("andrew)"), "SUB", stringVar)
' andrew) '
>>> re.sub(r'\b%s\b' % re.escape("andrew"), "SUB", stringVar)
' SUB) '
>>> stringVar = "zzzandrew)zzz"
>>> re.sub(r'\b%s\b' % re.escape("andrew"), "SUB", stringVar)
'zzzandrew)zzz'


If you have to use the ')' as part of the escape you can use a positive lookahead assertion like below which matches if there is a whitespace (\s) or a non-alphanumeric character (\W) after 'andrew)'

>>> stringVar = " andrew) "
>>> re.sub(r'\b%s(?=\s)' % re.escape("andrew)"), "SUB", stringVar)
' SUB '
>>> stringVar = "zzzandrew)zzz"
>>> re.sub(r'\b%s(?=\s)' % re.escape("andrew)"), "SUB", stringVar)
'zzzandrew)zzz'
>>> stringVar = " andrew) "
>>> re.sub(r'\b%s(?=\W)' % re.escape("andrew)"), "SUB", stringVar)
' SUB '
>>> stringVar = "zzzandrew)zzz"
>>> re.sub(r'\b%s(?=\W)' % re.escape("andrew)"), "SUB", stringVar)
'zzzandrew)zzz'

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复