Union in more than 2 pandas dataframe

前端 未结 3 1141
死守一世寂寞
死守一世寂寞 2021-02-20 04:38

I am trying to convert a sql query to python. The sql statement is as follows:

select * from table 1 
union
select * from table 2
union 
select * from table 3
un         


        
相关标签:
3条回答
  • 2021-02-20 04:56

    If I understand well the issue, you are looking for the concat function.

    pandas.concat([df1, df2, df3, df4]) should work correctly if the column names are the same for both dataframes.

    0 讨论(0)
  • 2021-02-20 05:00

    IIUC you can use merge and join by columns matching_col of all dataframes:

    import pandas as pd
    
    # Merge multiple dataframes
    df1 = pd.DataFrame({"matching_col": pd.Series({1: 4, 2: 5, 3: 7}), 
                        "a": pd.Series({1: 52, 2: 42, 3:7})}, columns=['matching_col','a'])
    print df1
       matching_col   a
    1             4  52
    2             5  42
    3             7   7
    
    df2 = pd.DataFrame({"matching_col": pd.Series({1: 2, 2: 7, 3: 8}), 
                        "a": pd.Series({1: 62, 2: 28, 3:9})}, columns=['matching_col','a'])
    print df2
       matching_col   a
    1             2  62
    2             7  28
    3             8   9
    
    df3 = pd.DataFrame({"matching_col": pd.Series({1: 1, 2: 0, 3: 7}), 
                        "a": pd.Series({1: 28, 2: 52, 3:3})}, columns=['matching_col','a'])
    print df3
       matching_col   a
    1             1  28
    2             0  52
    3             7   3
    
    df4 = pd.DataFrame({"matching_col": pd.Series({1: 4, 2: 9, 3: 7}), 
                        "a": pd.Series({1: 27, 2: 24, 3:7})}, columns=['matching_col','a'])
    print df4
       matching_col   a
    1             4  27
    2             9  24
    3             7   7
    

    Solution1:

    df = pd.merge(pd.merge(pd.merge(df1,df2,on='matching_col'),df3,on='matching_col'), df4, on='matching_col')
    set columns names
    df.columns = ['matching_col','a1','a2','a3','a4']
    print df
    
       matching_col  a1  a2  a3  a4
    0             7   7  28   3   7
    

    Solution2:

    dfs = [df1, df2, df3, df4]
    #use built-in python reduce
    df = reduce(lambda left,right: pd.merge(left,right,on='matching_col'), dfs)
    #set columns names
    df.columns = ['matching_col','a1','a2','a3','a4']
    print df
    
       matching_col  a1  a2  a3  a4
    0             7   7  28   3   7
    

    But if you need only concat dataframes, use concat with reseting index by parameter ignore_index=True:

    print pd.concat([df1, df2, df3, df4], ignore_index=True)
    
        matching_col   a
    0              4  52
    1              5  42
    2              7   7
    3              2  62
    4              7  28
    5              8   9
    6              1  28
    7              0  52
    8              7   3
    9              4  27
    10             9  24
    11             7   7
    
    0 讨论(0)
  • 2021-02-20 05:10

    This should be a comment on Jezrael's answer (+1'd for merge over concat) but I haven't sufficient reputation.

    The OP asked how to union the dfs, but merge returns intersection by default: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.merge.html#pandas.merge

    To get unions, add how='outer' to the merge calls.

    0 讨论(0)
提交回复
热议问题