Why my Python regular expression pattern run so slowly?

前端未结

关注

 2  672

Please see my regular expression pattern code:

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import re

print \'Start\'
str1 = \'abcdefgasdsdfswossdfasdaef\'
m =


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2021-01-22 09:21
              
            
            
                                                                       
See Runaway Regular Expressions: Catastrophic Backtracking.

In brief, if there are extremely many combinations a substring can be split into the parts of the regex, the regex matcher may end up trying them all.

Constructs like (x+)+ and x+x+ practically guarantee this behaviour.

To detect and fix the problematic constructs, the following concept can be used:


At conceptual level, the presence of a problematic construct means that your regex is ambiguous - i.e. if you disregard greedy/lazy behaviour, there's no single "correct" split of some text into the parts of the regex (or, equivalently, a subexpression thereof). So, to avoid/fix the problems, you need to see and eliminate all ambiguities.


One way to do this is to


always split the text into its meaningful parts (=parts that have separate meanings for the task at hand), and
define the parts in such a way that they cannot be confused (=using the same characteristics that you yourself would use to tell which is which if you were parsing it by hand)



                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-01-22 09:25
              
            
            
                                                                       
Just repost the answer and solution in comments from nhahtdh and Marc B:

([A-Za-z\-\s\:\.]+)+ --> [A-Za-z\-\s\:\.]+

Thanks so much to nhahtdh and Marc B!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复