How to combine multiple regular expressions into one line?

前端未结

关注

 2  1809

My script works fine doing this:

images = re.findall(\"src.\\\"(\\S*?media.tumblr\\S*?tumblr_\\S*?jpg)\", doc)
videos = re.findall(\"\\S*?(http\\S*?video_file\\S


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2021-01-22 08:38
              
            
            
                                                                       
If you really want efficient...

For starters, I would cut out the \S*? in the second regex. It serves no purpose apart from an opportunity for lots of backtracking.

src.\"(\S*?media.tumblr\S*?tumblr_\S*?jpg)|(http\S*?video_file\S*?tumblr_[a-zA-Z0-9]*)


Other ideas

You can get rid of the capture groups by using a small lookbehind in the first one, allowing you to get rid of all parentheses and directly matching what you want. Not faster, but tidier:

(?<=src.\")\S*?media.tumblr\S*?tumblr_\S*?jpg|http\S*?video_file\S*?tumblr_[a-zA-Z0-9]*


Do you intend for the periods after src and media to mean "any character", or to mean "a literal period"? If the latter, escape them: \.

You can use the re.IGNORECASE option and get rid of some letters:

(?<=src.\")\S*?media.tumblr\S*?tumblr_\S*?jpg|http\S*?video_file\S*?tumblr_[a-z0-9]*

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2021-01-22 08:51
              
            
            
                                                                       
As mentioned in the comments, a pipe (|) should do the trick.

The regular expression

(src.\"(\S*?media.tumblr\S*?tumblr_\S*?jpg))|(\S*?(http\S*?video_file\S*?tumblr_[a-zA-Z0-9]*))


catches either of the two patterns.

Demo on Regex Tester
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复