Python Pandas - Find difference between two data frames

前端 未结 10 1837
陌清茗
陌清茗 2020-11-22 13:59

I have two data frames df1 and df2, where df2 is a subset of df1. How do I get a new data frame (df3) which is the difference between the two data frames?

In other w

10条回答
  •  忘了有多久
    2020-11-22 14:45

    By using drop_duplicates

    pd.concat([df1,df2]).drop_duplicates(keep=False)
    

    Update :

    Above method only working for those dataframes they do not have duplicate itself, For example

    df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
    df2=pd.DataFrame({'A':[1],'B':[2]})
    

    It will output like below , which is wrong

    Wrong Output :

    pd.concat([df1, df2]).drop_duplicates(keep=False)
    Out[655]: 
       A  B
    1  2  3
    

    Correct Output

    Out[656]: 
       A  B
    1  2  3
    2  3  4
    3  3  4
    

    How to achieve that?

    Method 1: Using isin with tuple

    df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]
    Out[657]: 
       A  B
    1  2  3
    2  3  4
    3  3  4
    

    Method 2: merge with indicator

    df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
    Out[421]: 
       A  B     _merge
    1  2  3  left_only
    2  3  4  left_only
    3  3  4  left_only
    

提交回复
热议问题