问题
I have a data frame with some values by year
and type
. I want to replace all NaN values in each year with the mean of values in that year with a specific type. I would like to do this in the most elegant way possible. I'm dealing with a lot of data so less computation would be good as well.
Example:
df =pd.DataFrame({'year':[1,1,1,2,2,2],
'type':[1,1,2,1,1,2],
'val':[np.nan,5,10,100,200,np.nan]})
I want ALL nan's regardless of their type to be replaced with their respective year mean of all type 1.
In this example, the first row NaN should be replaced with 5
and the last row should be replaced with 150.
This only fills in values that are missing for type 1 , not type 2
df[val]=df[val].fillna(df.query('type==1').groupby('year')[val].transform('mean'))
回答1:
You want map
:
# calculate mean val of type 1 by year
s = df[df['type'].eq(1)].groupby('year')['val'].mean()
# replace `year` by the above mean, and fill in the Nan
df['val'] = df['val'].fillna(df['year'].map(s))
Output:
year type val
0 1 1 5.0
1 1 1 5.0
2 1 2 10.0
3 2 1 100.0
4 2 1 200.0
5 2 2 150.0
回答2:
Using fillna
and matching indexes
df['val'] = (df.set_index('year').val
.fillna(df.query('type == 1').groupby(['year']).val.mean())
.values)
year type val
0 1 1 5.0
1 1 1 5.0
2 1 2 10.0
3 2 1 100.0
4 2 1 200.0
5 2 2 150.0
回答3:
mask
and transform
df.fillna({'val': df.val.mask(df.type.ne(1)).groupby(df.year).transform('mean')})
year type val
0 1 1 5.0
1 1 1 5.0
2 1 2 10.0
3 2 1 100.0
4 2 1 200.0
5 2 2 150.0
来源:https://stackoverflow.com/questions/58509650/how-do-you-fill-nan-with-mean-of-a-subset-of-a-group