In Pandas, how can I patch a dataframe with missing values with values from another dataframe given a similar index?

前端 未结 2 1254
一生所求
一生所求 2021-01-20 02:44

From Fill in missing row values in pandas dataframe

I have the following dataframe and would like to fill in missing values.

mukey   hzdept_r    hzde         


        
2条回答
  •  爱一瞬间的悲伤
    2021-01-20 03:31

    Use combine_first. It allows you to patch up the missing data on the left dataframe with the matching data on the right dataframe based on same index.

    In this case, df1 is on the left and df2, the means, as the one on the right.

    In [48]: df = pd.read_csv('www004.csv')
        ...: df1 = df.set_index('mukey')
        ...: df2 = df.groupby('mukey').mean()
    
    In [49]: df1.loc[426178,:]
    Out[49]: 
            hzdept_r  hzdepb_r  sandtotal_r  silttotal_r  claytotal_r   om_r
    mukey                                                                   
    426178         0        36          NaN          NaN          NaN  72.50
    426178        36        66          NaN          NaN          NaN  72.50
    426178        66       152         42.1         37.9           20   0.25
    
    In [50]: df2.loc[426178,:]
    Out[50]: 
    hzdept_r       34.000000
    hzdepb_r       84.666667
    sandtotal_r    42.100000
    silttotal_r    37.900000
    claytotal_r    20.000000
    om_r           48.416667
    Name: 426178, dtype: float64
    
    In [51]: df3 = df1.combine_first(df2)
        ...: df3.loc[426178,:]
    Out[51]: 
            hzdept_r  hzdepb_r  sandtotal_r  silttotal_r  claytotal_r   om_r
    mukey                                                                   
    426178         0        36         42.1         37.9           20  72.50
    426178        36        66         42.1         37.9           20  72.50
    426178        66       152         42.1         37.9           20   0.25
    

    Note that the following rows still won't have values in the resulting df3

    426162
    426163
    426174
    426174
    426255
    

    because they were single rows to begin with, hence, .mean() doesn't mean anything to them (eh, see what I did there?).

提交回复
热议问题