Compare Multiple Columns to Get Rows that are Different in Two Pandas Dataframes

后端 未结 4 755
悲哀的现实
悲哀的现实 2021-01-14 16:38

I have two dataframes:

df1=
    A    B   C
0   A0   B0  C0
1   A1   B1  C1
2   A2   B2  C2

df2=
    A    B   C
0   A2   B2  C10
1   A1   B3  C11
2   A9   B4         


        
相关标签:
4条回答
  • 2021-01-14 17:22

    Method ( 1 )


    In [63]:
    df1['A'].isin(df2['A']) & df1['B'].isin(df2['B'])
    Out[63]:
    
    0   False
    1   False
    2   True
    

    Method ( 2 )


    you can use the left merge to obtain values that exist in both frames + values that exist in the first data frame only

    In [10]:
    left = pd.merge(df1 , df2 , on = ['A' , 'B'] ,how = 'left')
    left
    Out[10]:
        A   B   C_x C_y
    0   A0  B0  C0  NaN
    1   A1  B1  C1  NaN
    2   A2  B2  C2  C10
    

    then of course values that exist only in the first frame will have NAN values in columns of the other data frame , then you can filter by this NAN values by doing the following

    In [16]:
    left.loc[pd.isnull(left['C_y']) , 'A':'C_x']
    Out[16]:
        A   B   C_x
    0   A0  B0  C0
    1   A1  B1  C1
    
    In [17]:
    

    if you want to get whether the values in A exists in B you can do the following

    In [20]:
    pd.notnull(left['C_y'])
    Out[20]:
    0    False
    1    False
    2     True
    Name: C_y, dtype: bool
    
    0 讨论(0)
  • 2021-01-14 17:23

    Ideally, one would like to be able to just use ~df1[COLS].isin(df2[COLS]) as a mask, but this requires index labels to match (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html)

    Here is a succinct form that uses .isin but converts the second DataFrame to a dict so that index labels don't need to match:

    COLS = ['A', 'B'] # or whichever columns to use for comparison
    
    df1[~df1[COLS].isin(df2[COLS].to_dict(
        orient='list')).all(axis=1)]
    
    0 讨论(0)
  • 2021-01-14 17:28
     ~df1['A'].isin(df2['A'])
    

    Should get you the series you want

    df1[ ~df1['A'].isin(df2['A'])]
    

    The dataframe:

        A   B   C
    0   A0  B0  C0
    
    0 讨论(0)
  • 2021-01-14 17:34

    If your version is 0.17.0 then you can use pd.merge and pass the cols of interest, how='left' and set indicator=True to whether the values are only present in left or both. You can then test whether the appended _merge col is equal to 'both':

    In [102]:
    pd.merge(df1, df2, on='A',how='left', indicator=True)['_merge'] == 'both'
    
    Out[102]:
    0    False
    1     True
    2     True
    Name: _merge, dtype: bool
    
    In [103]:
    pd.merge(df1, df2, on=['A', 'B'],how='left', indicator=True)['_merge'] == 'both'
    
    Out[103]:
    0    False
    1    False
    2     True
    Name: _merge, dtype: bool
    

    output from the merge:

    In [104]:
    pd.merge(df1, df2, on='A',how='left', indicator=True)
    
    Out[104]:
        A B_x C_x  B_y  C_y     _merge
    0  A0  B0  C0  NaN  NaN  left_only
    1  A1  B1  C1   B3  C11       both
    2  A2  B2  C2   B2  C10       both
    
    In [105]:    
    pd.merge(df1, df2, on=['A', 'B'],how='left', indicator=True)
    
    Out[105]:
        A   B C_x  C_y     _merge
    0  A0  B0  C0  NaN  left_only
    1  A1  B1  C1  NaN  left_only
    2  A2  B2  C2  C10       both
    
    0 讨论(0)
提交回复
热议问题