Search in a string and obtain the 2 words before and after the match in Python

后端未结

关注

 4  1952

I\'m using Python to search some words (also multi-token) in a description (string).

To do that I\'m using a regex like this

    result = re.search(w


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2021-01-19 16:18
              
            
            
                                                                       
Try this regex: ((?:[a-z,]+\s+){0,2})here is\s+((?:[a-z,]+\s*){0,2})

with re.findall and re.IGNORECASE set

Demo
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2021-01-19 16:20
              
            
            
                                                                       
Based on your clarification, this becomes a bit more complicated.  The solution below deals with scenarios where the searched pattern may in fact also be in the two preceding or two subsequent words.  

line = "Parking here is horrible, here is great here is mediocre here is here is "
print line
pattern = "here is"
r = re.search(pattern, line, re.IGNORECASE)
output = []
if r:
    while line:
        before, match, line = line.partition(pattern)
        if match:
            if not output:
                before = before.split()[-2:]
            else:    
                before = ' '.join([pattern, before]).split()[-2:]
            after = line.split()[:2]
            output.append((before, after))
print output


Output from my example would be:



[(['Parking'], ['horrible,', 'here']), (['is', 'horrible,'], ['great', 'here']), (['is', 'great'], ['mediocre', 'here']), (['is', 'mediocre'], ['here', 'is']), (['here', 'is'], [])]
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2021-01-19 16:35
              
            
            
                                                                       
I would do it like this (edit: added anchors to cover most cases):

(\S+\s+|^)(\S+\s+|)here is(\s+\S+|)(\s+\S+|$)


Like this you will always have 4 groups (might have to be trimmed) with the following behavior:


If group 1 is empty, there was no word before (group 2 is empty too)
If group 2 is empty, there was only one word before (group 1)
If group 1 and 2 are not empty, they are the words before in order
If group 3 is empty, there was no word after
If group 4 is empty, there was only one word after
If group 3 and 4 are not empty, they are the words after in order


Corrected demo link
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-01-19 16:40
              
            
            
                                                                       
How about string operations?

line = 'Parking here is horrible, this shop sucks.'

before, term, after = line.partition('here is')
before = before.rsplit(maxsplit=2)[-2:]
after = after.split(maxsplit=2)[:2]


Result:

>>> before
['Parking']
>>> after
['horrible,', 'this']

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复