pandas: fillna with data from another dataframe, based on the same ID

后端 未结 2 1310
半阙折子戏
半阙折子戏 2021-01-07 02:25

df1 has missing values:

df1=

    ID age 
    1  12 
    2  na
    3  23
    4  na
    5  na
    6  na 


        
相关标签:
2条回答
  • 2021-01-07 02:48

    Check where the nulls are and then impute the value at those places.

    miss_bool = df1.age.isnull() 
    df2 = df2.set_index('Id')
    
    df1.loc[miss_bool, 'age'] = df1.loc[miss_bool, 'Id'].apply(lambda x: df2.age[x])
    
    0 讨论(0)
  • 2021-01-07 03:01

    You can set ID as index for both dataframes, and then use the fillna() method, which fill missing values, while matching the index of the two dataframes:

    df1.set_index("ID").age.fillna(df2.set_index("ID").age).reset_index()
    
    #  ID   age
    #0  1   12
    #1  2   4
    #2  3   23
    #3  4   5
    #4  5   6
    #5  6   7
    

    Another option is, combine_first, which takes values from the first dataframe, if not null, otherwise takes values from the second dataframe with index and columns matched:

    df1.set_index("ID").combine_first(df2.set_index("ID")).reset_index()
    
    #  ID   age
    #0  1   12.0
    #1  2   4.0
    #2  3   23.0
    #3  4   5.0
    #4  5   6.0
    #5  6   7.0
    
    0 讨论(0)
提交回复
热议问题