I have a multiple dataframe like below.
df1 = pd.DataFrame({\'Col1\':[\"aaa\",\"ffffd\",\"ggg\"],\'Col2\':[\"bbb\",\"eee\",\"hhh\"],\'Col3\':\"ccc\",\"fff\",\"
Here is one way using concat
and get_dummies
:
l = [df1,df2,df3] #create a list of dataframes
final = pd.concat([i.assign(key=f"df{e+1}") for e,i in enumerate(l)],sort=False)
final = (final.assign(**pd.get_dummies(final.pop('key')))
.groupby(['Col1','Col2','Col3']).max().reset_index())
Col1 Col2 Col3 df1 df2 df3
0 aaa bbb ccc 1 1 0
1 ffffd eee fff 1 0 0
2 ggg hhh iii 1 0 0
3 ppp ttt qqq 0 0 1
4 qqq eee www 0 1 1
5 rrr ttt yyy 0 0 1
6 zzz xxx yyy 0 1 1
Using pandas.concat
and groupby
:
dfs = [df1,df2,df3]
dfs = [d.assign(df='df%s' % n) for n, d in enumerate(dfs, start=1)]
new_df = pd.concat(dfs).groupby(['Col1', 'Col2', 'Col3','df']).size().unstack(fill_value=0)
print(new_df)
Output:
df df1 df2 df3
Col1 Col2 Col3
aaa bbb ccc 1 1 0
ffffd eee fff 1 0 0
ggg hhh iii 1 0 0
ppp ttt qqq 0 0 1
qqq eee www 0 1 1
rrr ttt yyy 0 0 1
zzz xxx yyy 0 1 1
Setup:
df1 = pd.DataFrame({'Col1':["aaa","ffffd","ggg"],'Col2':["bbb","eee","hhh"],'Col3':["ccc","fff","iii"]})
df2= pd.DataFrame({'Col1':["aaa","zzz","qqq"],'Col2':["bbb","xxx","eee"],'Col3':["ccc", "yyy","www"]})
df3= pd.DataFrame({'Col1':["rrr","zzz","qqq","ppp"],'Col2':["ttt","xxx","eee","ttt"],'Col3':["yyy","yyy","www","qqq"]})
Solution:
First create a indicate column for each dataframe, then concat, groupby and sum.
df1['df1'] = df2['df2'] = df3['df3'] = 1
(
pd.concat([df1, df2, df3], sort=False)
.groupby(by=['Col1', 'Col2', 'Col3'])
.max().astype(int)
.reset_index()
)
Col1 Col2 Col3 df1 df2 df3
0 aaa bbb ccc 1 1 0
1 ffffd eee fff 1 0 0
2 ggg hhh iii 1 0 0
3 ppp ttt qqq 0 0 1
4 qqq eee www 0 1 1
5 rrr ttt yyy 0 0 1
6 zzz xxx yyy 0 1 1