pandas three-way joining multiple dataframes on columns

前端 未结 11 1768
醉梦人生
醉梦人生 2020-11-22 08:35

I have 3 CSV files. Each has the first column as the (string) names of people, while all the other columns in each dataframe are attributes of that person.

How can

11条回答
  •  抹茶落季
    2020-11-22 08:57

    One does not need a multiindex to perform join operations. One just need to set correctly the index column on which to perform the join operations (which command df.set_index('Name') for example)

    The join operation is by default performed on index. In your case, you just have to specify that the Name column corresponds to your index. Below is an example

    A tutorial may be useful.

    # Simple example where dataframes index are the name on which to perform
    # the join operations
    import pandas as pd
    import numpy as np
    name = ['Sophia' ,'Emma' ,'Isabella' ,'Olivia' ,'Ava' ,'Emily' ,'Abigail' ,'Mia']
    df1 = pd.DataFrame(np.random.randn(8, 3), columns=['A','B','C'], index=name)
    df2 = pd.DataFrame(np.random.randn(8, 1), columns=['D'],         index=name)
    df3 = pd.DataFrame(np.random.randn(8, 2), columns=['E','F'],     index=name)
    df = df1.join(df2)
    df = df.join(df3)
    
    # If you have a 'Name' column that is not the index of your dataframe,
    # one can set this column to be the index
    # 1) Create a column 'Name' based on the previous index
    df1['Name'] = df1.index
    # 1) Select the index from column 'Name'
    df1 = df1.set_index('Name')
    
    # If indexes are different, one may have to play with parameter how
    gf1 = pd.DataFrame(np.random.randn(8, 3), columns=['A','B','C'], index=range(8))
    gf2 = pd.DataFrame(np.random.randn(8, 1), columns=['D'], index=range(2,10))
    gf3 = pd.DataFrame(np.random.randn(8, 2), columns=['E','F'], index=range(4,12))
    
    gf = gf1.join(gf2, how='outer')
    gf = gf.join(gf3, how='outer')
    

提交回复
热议问题