How to count overlap rows among multiple dataframes?

前端 未结 3 668
庸人自扰
庸人自扰 2021-01-19 15:44

I have a multiple dataframe like below.

df1 = pd.DataFrame({\'Col1\':[\"aaa\",\"ffffd\",\"ggg\"],\'Col2\':[\"bbb\",\"eee\",\"hhh\"],\'Col3\':\"ccc\",\"fff\",\"         


        
3条回答
  •  情话喂你
    2021-01-19 16:38

    Setup:

    df1 = pd.DataFrame({'Col1':["aaa","ffffd","ggg"],'Col2':["bbb","eee","hhh"],'Col3':["ccc","fff","iii"]})
    df2= pd.DataFrame({'Col1':["aaa","zzz","qqq"],'Col2':["bbb","xxx","eee"],'Col3':["ccc", "yyy","www"]})
    df3= pd.DataFrame({'Col1':["rrr","zzz","qqq","ppp"],'Col2':["ttt","xxx","eee","ttt"],'Col3':["yyy","yyy","www","qqq"]})
    

    Solution:

    First create a indicate column for each dataframe, then concat, groupby and sum.

    df1['df1'] = df2['df2'] = df3['df3'] = 1
    (
        pd.concat([df1, df2, df3], sort=False)
        .groupby(by=['Col1', 'Col2', 'Col3'])
        .max().astype(int)
        .reset_index()
    )
    
            Col1    Col2    Col3    df1 df2 df3
    0       aaa     bbb     ccc     1   1   0
    1       ffffd     eee     fff     1   0   0
    2       ggg     hhh     iii     1   0   0
    3       ppp     ttt     qqq     0   0   1
    4       qqq     eee     www     0   1   1
    5       rrr     ttt     yyy     0   0   1
    6       zzz     xxx     yyy     0   1   1
    

提交回复
热议问题