From Fill in missing row values in pandas dataframe
I have the following dataframe and would like to fill in missing values.
mukey hzdept_r hzde
Use combine_first. It allows you to patch up the missing data on the left dataframe with the matching data on the right dataframe based on same index.
In this case, df1
is on the left and df2
, the means, as the one on the right.
In [48]: df = pd.read_csv('www004.csv')
...: df1 = df.set_index('mukey')
...: df2 = df.groupby('mukey').mean()
In [49]: df1.loc[426178,:]
Out[49]:
hzdept_r hzdepb_r sandtotal_r silttotal_r claytotal_r om_r
mukey
426178 0 36 NaN NaN NaN 72.50
426178 36 66 NaN NaN NaN 72.50
426178 66 152 42.1 37.9 20 0.25
In [50]: df2.loc[426178,:]
Out[50]:
hzdept_r 34.000000
hzdepb_r 84.666667
sandtotal_r 42.100000
silttotal_r 37.900000
claytotal_r 20.000000
om_r 48.416667
Name: 426178, dtype: float64
In [51]: df3 = df1.combine_first(df2)
...: df3.loc[426178,:]
Out[51]:
hzdept_r hzdepb_r sandtotal_r silttotal_r claytotal_r om_r
mukey
426178 0 36 42.1 37.9 20 72.50
426178 36 66 42.1 37.9 20 72.50
426178 66 152 42.1 37.9 20 0.25
Note that the following rows still won't have values in the resulting df3
426162
426163
426174
426174
426255
because they were single rows to begin with, hence, .mean()
doesn't mean anything to them (eh, see what I did there?).