Apply fuzzy matching across a dataframe column and save results in a new column

前端未结

关注

 1  456

I have two data frames with each having a different number of rows. Below is a couple rows from each data set

df1 =
     Company


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2020-11-28 11:23
              
            
            
                                                                       
I couldn't tell what you were doing.  This is how I would do it.

from fuzzywuzzy import fuzz
from fuzzywuzzy import process


Create a series of tuples to compare:

compare = pd.MultiIndex.from_product([df1['Company'],
                                      df2['FDA Company']]).to_series()


Create a special function to calculate fuzzy metrics and return a series.

def metrics(tup):
    return pd.Series([fuzz.ratio(*tup),
                      fuzz.token_sort_ratio(*tup)],
                     ['ratio', 'token'])


Apply metrics to the compare series

compare.apply(metrics)




There are bunch of ways to do this next part:

Get closest matches to each row of df1

compare.apply(metrics).unstack().idxmax().unstack(0)




Get closest matches to each row of df2

compare.apply(metrics).unstack(0).idxmax().unstack(0)



                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复