Regular expression for matching non-whitespace in Python

后端未结

关注

 4  1743

I want to use re.search to extract the first set of non-whitespace characters. I have the following pseudoscript that recreates my problem:

#!/usr/b


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2021-01-20 01:33
              
            
            
                                                                       
import re
line = "STARC-1.1.1.5             ConsCase    WARNING    Warning"
m = re.search('S.+[0-9]',line)
print(m.group(0))

The re.search returns the match, so use the alphabets and numbers and print the match as mentioned in the code. If you print only the variable it prints it as match 1.
Hope this answers your question
m = re.search('[A-Z].+[0-9]',line)

Changing the re.search to the capital letters will find from CAPS A to Z,
vice vers if you change it to small letters as
m = re.search('[a-z].+[0-9]',line)

it will find only small letters, sometimes you should highlight the symbols too, to search from it or to search upto the characters before that symbol.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2021-01-20 01:38
              
            
            
                                                                       
\s matches a whitespace character.

\S matches a non-whitespace character.

[...] matches a character in the set ....

[^...] matches a character not in the set ....

[^\S] matches a character that is not a non-whitespace character, i.e. it matches a whitespace character.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2021-01-20 01:47
              
            
            
                                                                       
The [^\S] is a negated character class that is equal to \s (whitespace pattern). The *? is a lazy quantifier that matches zero or more characters, but as few as possible, and when used at the end  of the pattern never actually matches any characters.

Replace you m = re.search('^[^\S]*?',line) line with

m = re.match(r'\S+',line)


or - if you want to also allow an empty string match:

m = re.match(r'\S*',line)


The re.match method anchors the pattern at the start of the string. With re.search, you need to keep the ^ anchor at the start of the pattern:

m = re.search(r'^\S+',line)


See the Python demo:

import re
line = "STARC-1.1.1.5             ConsCase    WARNING    Warning"
m = re.search('^\S+',line)
if m:
    print m.group(0)
# => STARC-1.1.1.5


However, here, in this case, you may just use a mere split():

res = line.split() 
print(res[0])


See another Python demo.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-01-20 01:49
              
            
            
                                                                       
Replace your re.search as below, \S finds non-whitespace character, and + searches for one or more times. Python starts to search from first character.
import re
line = "STARC-1.1.1.5             ConsCase    WARNING    Warning"
m = re.search('\S+',line)
print(m.group(0))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复