Find equal columns between two dataframes

后端未结

关注

 4  1091

I have two pandas data frames, a and b:

a1   a2   a3   a4   a5   a6   a7
1    3    4    5    3    4    5
0    2    0


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2020-12-29 07:03
              
            
            
                                                                       
One way of merge 

s=df1.T.reset_index().merge(df2.T.assign(match=lambda x : x.index))
dict(zip(s['index'],s['match']))
{'a1': 'b5', 'a2': 'b7', 'a3': 'b6', 'a4': 'b4', 'a5': 'b1', 'a6': 'b3', 'a7': 'b2'}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我在风中等你        
                
              
                            
                2020-12-29 07:04
              
            
            
                                                                       
Here's one way leveraging broadcasting to check for equality between both dataframes and taking all on the result to check where all rows match. Then we can obtain indexing arrays for both dataframe's column names from the result of np.where  (with @piR's contribution):

i, j = np.where((a.values[:,None] == b.values[:,:,None]).all(axis=0))
dict(zip(a.columns[j], b.columns[i]))
# {'a7': 'b2', 'a6': 'b3', 'a4': 'b4', 'a2': 'b7'}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2020-12-29 07:18
              
            
            
                                                                       
Here is a way using sort_values:

m=df1.T.sort_values(by=[*df1.index]).index
n=df2.T.sort_values(by=[*df2.index]).index
d=dict(zip(m,n))
print(d)




{'a1': 'b5', 'a5': 'b1', 'a2': 'b7', 'a3': 'b6', 'a6': 'b3', 'a7': 'b2', 'a4': 'b4'}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-29 07:22
              
            
            
                                                                       
dictionary comprehensions

Use a tuple of the column values as the hashable key in a dictionary

d = {(*t,): c for c, t in df2.items()}
{c: d[(*t,)] for c, t in df1.items()}

{'a1': 'b5',
 'a2': 'b7',
 'a3': 'b6',
 'a4': 'b4',
 'a5': 'b1',
 'a6': 'b3',
 'a7': 'b2'}


Just in case we don't have perfect representation, I've only produced the dictionary for columns where there is a match.

d2 = {(*t,): c for c, t in df2.items()}
d1 = {(*t,): c for c, t in df1.items()}

{d1[c]: d2[c] for c in {*d1} & {*d2}}

{'a5': 'b1',
 'a2': 'b7',
 'a7': 'b2',
 'a6': 'b3',
 'a3': 'b6',
 'a1': 'b5',
 'a4': 'b4'}




idxmax

This borders on the absurd...  Don't actually do this.

{c: df2.T.eq(df1[c]).sum(1).idxmax() for c in df1}

{'a1': 'b5',
 'a2': 'b7',
 'a3': 'b6',
 'a4': 'b4',
 'a5': 'b1',
 'a6': 'b3',
 'a7': 'b2'}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复