Pandas: merge multiple dataframes and control column names?

后端 未结 3 2132
既然无缘
既然无缘 2021-02-06 12:03

I would like to merge nine Pandas dataframes together into a single dataframe, doing a join on two columns, controlling the column names. Is this possible?

I have nine d

相关标签:
3条回答
  • 2021-02-06 12:23

    Would doing a big pd.concat() and then renaming all the columns work for you? Something like:

    desired_columns = ['items', 'spend']
    big_df = pd.concat([df1, df2[desired_columns], ..., dfN[desired_columns]], axis=1)
    
    
    new_columns = ['org', 'name']
    for i in range(num_dataframes):
        new_columns.extend(['spend_df%i' % i, 'items_df%i' % i])
    
    bid_df.columns = new_columns
    

    This should give you columns like:

    org, name, spend_df0, items_df0, spend_df1, items_df1, ..., spend_df8, items_df8

    0 讨论(0)
  • 2021-02-06 12:35

    I've wanted this as well at times but been unable to find a built-in pandas way of doing it. Here is my suggestion (and my plan for the next time I need it):

    1. Create an empty dictionary, merge_dict.
    2. Loop through the index you want for each of your data frames and add the desired values to the dictionary with the index as the key.
    3. Generate a new index as sorted(merge_dict).
    4. Generate a new list of data for each column by looping through merge_dict.items().
    5. Create a new data frame with index=sorted(merge_dict) and columns created in the previous step.

    Basically, this is somewhat like a hash join in SQL. Seems like the most efficient way I can think of and shouldn't take too long to code up.

    Good luck.

    0 讨论(0)
  • 2021-02-06 12:40

    You could use functools.reduce to iteratively apply pd.merge to each of the DataFrames:

    result = functools.reduce(merge, dfs)
    

    This is equivalent to

    result = dfs[0]
    for df in dfs[1:]:
        result = merge(result, df)
    

    To pass the on=['org', 'name'] argument, you could use functools.partial define the merge function:

    merge = functools.partial(pd.merge, on=['org', 'name'])
    

    Since specifying the suffixes parameter in functools.partial would only allow one fixed choice of suffix, and since here we need a different suffix for each pd.merge call, I think it would be easiest to prepare the DataFrames column names before calling pd.merge:

    for i, df in enumerate(dfs, start=1):
        df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')}, 
                  inplace=True)
    

    For example,

    import pandas as pd
    import numpy as np
    import functools
    np.random.seed(2015)
    
    N = 50
    dfs = [pd.DataFrame(np.random.randint(5, size=(N,4)), 
                        columns=['org', 'name', 'items', 'spend']) for i in range(9)]
    for i, df in enumerate(dfs, start=1):
        df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')}, 
                  inplace=True)
    merge = functools.partial(pd.merge, on=['org', 'name'])
    result = functools.reduce(merge, dfs)
    print(result.head())
    

    yields

       org  name  items_df1  spend_df1  items_df2  spend_df2  items_df3  \
    0    2     4          4          2          3          0          1   
    1    2     4          4          2          3          0          1   
    2    2     4          4          2          3          0          1   
    3    2     4          4          2          3          0          1   
    4    2     4          4          2          3          0          1   
    
       spend_df3  items_df4  spend_df4  items_df5  spend_df5  items_df6  \
    0          3          1          0          1          0          4   
    1          3          1          0          1          0          4   
    2          3          1          0          1          0          4   
    3          3          1          0          1          0          4   
    4          3          1          0          1          0          4   
    
       spend_df6  items_df7  spend_df7  items_df8  spend_df8  items_df9  spend_df9  
    0          3          4          1          3          0          1          2  
    1          3          4          1          3          0          0          3  
    2          3          4          1          3          0          0          0  
    3          3          3          1          3          0          1          2  
    4          3          3          1          3          0          0          3  
    
    0 讨论(0)
提交回复
热议问题