Compare Multiple Columns to Get Rows that are Different in Two Pandas Dataframes

后端未结

关注

 4  763

I have two dataframes:

df1=
    A    B   C
0   A0   B0  C0
1   A1   B1  C1
2   A2   B2  C2

df2=
    A    B   C
0   A2   B2  C10
1   A1   B3  C11
2   A9   B4


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2021-01-14 17:22
              
            
            
                                                                       
Method ( 1 )



In [63]:
df1['A'].isin(df2['A']) & df1['B'].isin(df2['B'])
Out[63]:

0   False
1   False
2   True


Method ( 2 )



you can use the left merge to obtain values that exist in both frames + values that exist in the first data frame only

In [10]:
left = pd.merge(df1 , df2 , on = ['A' , 'B'] ,how = 'left')
left
Out[10]:
    A   B   C_x C_y
0   A0  B0  C0  NaN
1   A1  B1  C1  NaN
2   A2  B2  C2  C10


then of course values that exist only in the first frame will have NAN values in columns of the other data frame , then you can filter by this NAN values by doing the following

In [16]:
left.loc[pd.isnull(left['C_y']) , 'A':'C_x']
Out[16]:
    A   B   C_x
0   A0  B0  C0
1   A1  B1  C1

In [17]:


if you want to get whether the values in A exists in B you can do the following

In [20]:
pd.notnull(left['C_y'])
Out[20]:
0    False
1    False
2     True
Name: C_y, dtype: bool

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  深忆病人        
                
              
                            
                2021-01-14 17:23
              
            
            
                                                                       
Ideally, one would like to be able to just use ~df1[COLS].isin(df2[COLS]) as a mask, but this requires index labels to match (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html)

Here is a succinct form that uses .isin but converts the second DataFrame to a dict so that index labels don't need to match:

COLS = ['A', 'B'] # or whichever columns to use for comparison

df1[~df1[COLS].isin(df2[COLS].to_dict(
    orient='list')).all(axis=1)]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2021-01-14 17:28
              
            
            
                                                                       
 ~df1['A'].isin(df2['A'])


Should get you the series you want

df1[ ~df1['A'].isin(df2['A'])]


The dataframe:

    A   B   C
0   A0  B0  C0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-01-14 17:34
              
            
            
                                                                       
If your version is 0.17.0 then you can use pd.merge and pass the cols of interest, how='left' and set indicator=True to whether the values are only present in left or both. You can then test whether the appended _merge col is equal to 'both':

In [102]:
pd.merge(df1, df2, on='A',how='left', indicator=True)['_merge'] == 'both'

Out[102]:
0    False
1     True
2     True
Name: _merge, dtype: bool

In [103]:
pd.merge(df1, df2, on=['A', 'B'],how='left', indicator=True)['_merge'] == 'both'

Out[103]:
0    False
1    False
2     True
Name: _merge, dtype: bool


output from the merge:

In [104]:
pd.merge(df1, df2, on='A',how='left', indicator=True)

Out[104]:
    A B_x C_x  B_y  C_y     _merge
0  A0  B0  C0  NaN  NaN  left_only
1  A1  B1  C1   B3  C11       both
2  A2  B2  C2   B2  C10       both

In [105]:    
pd.merge(df1, df2, on=['A', 'B'],how='left', indicator=True)

Out[105]:
    A   B C_x  C_y     _merge
0  A0  B0  C0  NaN  left_only
1  A1  B1  C1  NaN  left_only
2  A2  B2  C2  C10       both

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复