I am new to using DataFrame and I would like to know how to perform a SQL equivalent of left outer join on multiple columns on a series of tables
Example:
One can also do this with a compact version of @TomAugspurger's answer, like so:
df = df1.merge(df2, how='left', on=['Year', 'Week', 'Colour']).merge(df3[['Week', 'Colour', 'Val3']], how='left', on=['Week', 'Colour'])
Merge them in two steps, df1
and df2
first, and then the result of that to df3
.
In [33]: s1 = pd.merge(df1, df2, how='left', on=['Year', 'Week', 'Colour'])
I dropped year from df3 since you don't need it for the last join.
In [39]: df = pd.merge(s1, df3[['Week', 'Colour', 'Val3']],
how='left', on=['Week', 'Colour'])
In [40]: df
Out[40]:
Year Week Colour Val1 Val2 Val3
0 2014 A Red 50 NaN NaN
1 2014 B Red 60 NaN 60
2 2014 B Black 70 100 10
3 2014 C Red 10 20 NaN
4 2014 D Green 20 NaN 20
[5 rows x 6 columns]