How do you fill NaN with mean of a subset of a group?

问题

I have a data frame with some values by year and type. I want to replace all NaN values in each year with the mean of values in that year with a specific type. I would like to do this in the most elegant way possible. I'm dealing with a lot of data so less computation would be good as well.

Example:

df =pd.DataFrame({'year':[1,1,1,2,2,2],
                  'type':[1,1,2,1,1,2],
             'val':[np.nan,5,10,100,200,np.nan]})

I want ALL nan's regardless of their type to be replaced with their respective year mean of all type 1.

In this example, the first row NaN should be replaced with 5 and the last row should be replaced with 150.

This only fills in values that are missing for type 1 , not type 2

df[val]=df[val].fillna(df.query('type==1').groupby('year')[val].transform('mean'))

回答1:

You want map:

# calculate mean val of type 1 by year
s = df[df['type'].eq(1)].groupby('year')['val'].mean()

# replace `year` by the above mean, and fill in the Nan
df['val'] = df['val'].fillna(df['year'].map(s))

Output:

   year  type    val
0     1     1    5.0
1     1     1    5.0
2     1     2   10.0
3     2     1  100.0
4     2     1  200.0
5     2     2  150.0

回答2:

Using fillna and matching indexes

df['val'] = (df.set_index('year').val
              .fillna(df.query('type == 1').groupby(['year']).val.mean())
              .values)

  year  type    val
0     1     1    5.0
1     1     1    5.0
2     1     2   10.0
3     2     1  100.0
4     2     1  200.0
5     2     2  150.0

回答3:

`mask` and `transform`

df.fillna({'val': df.val.mask(df.type.ne(1)).groupby(df.year).transform('mean')})

   year  type    val
0     1     1    5.0
1     1     1    5.0
2     1     2   10.0
3     2     1  100.0
4     2     1  200.0
5     2     2  150.0

来源：https://stackoverflow.com/questions/58509650/how-do-you-fill-nan-with-mean-of-a-subset-of-a-group

标签

python