Pandas left outer join multiple dataframes on multiple columns

后端 未结 2 1535
再見小時候
再見小時候 2020-11-28 03:59

I am new to using DataFrame and I would like to know how to perform a SQL equivalent of left outer join on multiple columns on a series of tables

Example:

         


        
相关标签:
2条回答
  • 2020-11-28 04:12

    One can also do this with a compact version of @TomAugspurger's answer, like so:

    df = df1.merge(df2, how='left', on=['Year', 'Week', 'Colour']).merge(df3[['Week', 'Colour', 'Val3']], how='left', on=['Week', 'Colour'])
    
    0 讨论(0)
  • 2020-11-28 04:28

    Merge them in two steps, df1 and df2 first, and then the result of that to df3.

    In [33]: s1 = pd.merge(df1, df2, how='left', on=['Year', 'Week', 'Colour'])
    

    I dropped year from df3 since you don't need it for the last join.

    In [39]: df = pd.merge(s1, df3[['Week', 'Colour', 'Val3']],
                           how='left', on=['Week', 'Colour'])
    
    In [40]: df
    Out[40]: 
       Year Week Colour  Val1  Val2 Val3
    0  2014    A    Red    50   NaN  NaN
    1  2014    B    Red    60   NaN   60
    2  2014    B  Black    70   100   10
    3  2014    C    Red    10    20  NaN
    4  2014    D  Green    20   NaN   20
    
    [5 rows x 6 columns]
    
    0 讨论(0)
提交回复
热议问题