pandas three-way joining multiple dataframes on columns

前端 未结 11 1782
醉梦人生
醉梦人生 2020-11-22 08:35

I have 3 CSV files. Each has the first column as the (string) names of people, while all the other columns in each dataframe are attributes of that person.

How can

相关标签:
11条回答
  • 2020-11-22 09:03

    Assumed imports:

    import pandas as pd
    

    John Galt's answer is basically a reduce operation. If I have more than a handful of dataframes, I'd put them in a list like this (generated via list comprehensions or loops or whatnot):

    dfs = [df0, df1, df2, dfN]
    

    Assuming they have some common column, like name in your example, I'd do the following:

    df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)
    

    That way, your code should work with whatever number of dataframes you want to merge.

    Edit August 1, 2016: For those using Python 3: reduce has been moved into functools. So to use this function, you'll first need to import that module:

    from functools import reduce
    
    0 讨论(0)
  • 2020-11-22 09:07

    I tweaked the accepted answer to perform the operation for multiple dataframes on different suffix parameters using reduce and i guess it can be extended to different on parameters as well.

    from functools import reduce 
    
    dfs_with_suffixes = [(df2,suffix2), (df3,suffix3), 
                         (df4,suffix4)]
    
    merge_one = lambda x,y,sfx:pd.merge(x,y,on=['col1','col2'..], suffixes=sfx)
    
    merged = reduce(lambda left,right:merge_one(left,*right), dfs_with_suffixes, df1)
    
    0 讨论(0)
  • 2020-11-22 09:08

    In python 3.6.3 with pandas 0.22.0 you can also use concat as long as you set as index the columns you want to use for the joining

    pd.concat(
        (iDF.set_index('name') for iDF in [df1, df2, df3]),
        axis=1, join='inner'
    ).reset_index()
    

    where df1, df2, and df3 are defined as in John Galt's answer

    import pandas as pd
    df1 = pd.DataFrame(np.array([
        ['a', 5, 9],
        ['b', 4, 61],
        ['c', 24, 9]]),
        columns=['name', 'attr11', 'attr12']
    )
    df2 = pd.DataFrame(np.array([
        ['a', 5, 19],
        ['b', 14, 16],
        ['c', 4, 9]]),
        columns=['name', 'attr21', 'attr22']
    )
    df3 = pd.DataFrame(np.array([
        ['a', 15, 49],
        ['b', 4, 36],
        ['c', 14, 9]]),
        columns=['name', 'attr31', 'attr32']
    )
    
    0 讨论(0)
  • 2020-11-22 09:10

    This can also be done as follows for a list of dataframes df_list:

    df = df_list[0]
    for df_ in df_list[1:]:
        df = df.merge(df_, on='join_col_name')
    

    or if the dataframes are in a generator object (e.g. to reduce memory consumption):

    df = next(df_list)
    for df_ in df_list:
        df = df.merge(df_, on='join_col_name')
    
    0 讨论(0)
  • 2020-11-22 09:11

    This is an ideal situation for the join method

    The join method is built exactly for these types of situations. You can join any number of DataFrames together with it. The calling DataFrame joins with the index of the collection of passed DataFrames. To work with multiple DataFrames, you must put the joining columns in the index.

    The code would look something like this:

    filenames = ['fn1', 'fn2', 'fn3', 'fn4',....]
    dfs = [pd.read_csv(filename, index_col=index_col) for filename in filenames)]
    dfs[0].join(dfs[1:])
    

    With @zero's data, you could do this:

    df1 = pd.DataFrame(np.array([
        ['a', 5, 9],
        ['b', 4, 61],
        ['c', 24, 9]]),
        columns=['name', 'attr11', 'attr12'])
    df2 = pd.DataFrame(np.array([
        ['a', 5, 19],
        ['b', 14, 16],
        ['c', 4, 9]]),
        columns=['name', 'attr21', 'attr22'])
    df3 = pd.DataFrame(np.array([
        ['a', 15, 49],
        ['b', 4, 36],
        ['c', 14, 9]]),
        columns=['name', 'attr31', 'attr32'])
    
    dfs = [df1, df2, df3]
    dfs = [df.set_index('name') for df in dfs]
    dfs[0].join(dfs[1:])
    
         attr11 attr12 attr21 attr22 attr31 attr32
    name                                          
    a         5      9      5     19     15     49
    b         4     61     14     16      4     36
    c        24      9      4      9     14      9
    
    0 讨论(0)
提交回复
热议问题