Regular Expressions in Python unexpectedly slow

后端未结

关注

 4  1739

迷失自我 2021-02-01 16:16

Consider this Python code:

import timeit
import re

def one():
        any(s in mystring for s in (\'foo\', \'bar\', \'hello\'))

r = re.compile(\'(foo|bar|hello


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   旧时难觅i
                                             
                
                
                (楼主)
            
              
              
                2021-02-01 16:25
              

            
            
                        
Note to future readers

I think the correct answer is actually that Python's string handling algorithms are really optimized for this case, and the re module is actually a bit slower. What I've written below is true, but is probably not relevant to the simple regexps I have in the question.

Original Answer

Apparently this is not a random fluke - Python's re module really is slower. It looks like it uses a recursive backtracking approach when it fails to find a match, as opposed to building a DFA and simulating it.

It uses the backtracking approach even when there are no back references in the regular expression!

What this means is that in the worst case, Python regexs take exponential, and not linear, time!

This is a very detailed paper describing the issue:
http://swtch.com/~rsc/regexp/regexp1.html

I think this graph near the end summarizes it succinctly:
 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复