I have a multiple dataframe like below.
df1 = pd.DataFrame({\'Col1\':[\"aaa\",\"ffffd\",\"ggg\"],\'Col2\':[\"bbb\",\"eee\",\"hhh\"],\'Col3\':\"ccc\",\"fff\",\"
Setup:
df1 = pd.DataFrame({'Col1':["aaa","ffffd","ggg"],'Col2':["bbb","eee","hhh"],'Col3':["ccc","fff","iii"]})
df2= pd.DataFrame({'Col1':["aaa","zzz","qqq"],'Col2':["bbb","xxx","eee"],'Col3':["ccc", "yyy","www"]})
df3= pd.DataFrame({'Col1':["rrr","zzz","qqq","ppp"],'Col2':["ttt","xxx","eee","ttt"],'Col3':["yyy","yyy","www","qqq"]})
Solution:
First create a indicate column for each dataframe, then concat, groupby and sum.
df1['df1'] = df2['df2'] = df3['df3'] = 1
(
pd.concat([df1, df2, df3], sort=False)
.groupby(by=['Col1', 'Col2', 'Col3'])
.max().astype(int)
.reset_index()
)
Col1 Col2 Col3 df1 df2 df3
0 aaa bbb ccc 1 1 0
1 ffffd eee fff 1 0 0
2 ggg hhh iii 1 0 0
3 ppp ttt qqq 0 0 1
4 qqq eee www 0 1 1
5 rrr ttt yyy 0 0 1
6 zzz xxx yyy 0 1 1