How to match a string against a set of wildcard strings efficiently?

后端未结

关注

 2  1754

I am looking for a solution to match a single string against a set of wildcard strings. For example

>>> match(\"ab\", [\"a*\", \"b*\", \"*\", \"c\",


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  逝去的感伤        
                
              
                            
                2021-01-14 17:25
              
            
            
                                                                       
Seems like Aho-Corasick algorithm would work. esmre seem to do what I'm looking for. I got this information from this question.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2021-01-14 17:35
              
            
            
                                                                       
You could use FilteredRE2 class from re2 library with a help from Aho-Corasick algorithm implementation (or similar). From re2 docs:


  Required substrings. Suppose you have an efficient way to check which
  of a list of strings appear as substrings in a large text (for
  example, maybe you implemented the Aho-Corasick algorithm), but now
  your users want to be able to do regular expression searches
  efficiently too. Regular expressions often have large literal strings
  in them; if those could be identified, they could be fed into the
  string searcher, and then the results of the string searcher could be
  used to filter the set of regular expression searches that are
  necessary. The FilteredRE2 class implements this analysis. Given a
  list of regular expressions, it walks the regular expressions to
  compute a boolean expression involving literal strings and then
  returns the list of strings. For example, FilteredRE2 converts
  (hello|hi)world[a-z]+foo into the boolean expression “(helloworld OR
  hiworld) AND foo” and returns those three strings. Given multiple
  regular expressions, FilteredRE2 converts each into a boolean
  expression and returns all the strings involved. Then, after being
  told which of the strings are present, FilteredRE2 can evaluate each
  expression to identify the set of regular expressions that could
  possibly be present. This filtering can reduce the number of actual
  regular expression searches significantly.
  
  The feasibility of these analyses depends crucially on the simplicity
  of their input. The first uses the DFA form, while the second uses the
  parsed regular expression (Regexp*). These kind of analyses would be
  more complicated (maybe even impossible) if RE2 allowed non-regular
  features in its regular expressions.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复