python pandas dataframe : fill nans with a conditional mean

无人久伴 提交于 2020-04-12 10:49:30

问题


I have the following dataframe:

import numpy as np 
import pandas as pd
df = pd.DataFrame(data={'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'],
                        'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]})

    Cat   Vals
0   A     1
1   A     2
2   A     3
3   B     4
4   B     5
5   A   NaN
6   B   NaN

And I want indexes 5 and 6 to be filled with the conditional mean of 'Vals' based on the 'Cat' column, namely 2 and 4.5

The following code works fine:

means = df.groupby('Cat').Vals.mean()
for i in df[df.Vals.isnull()].index:
    df.loc[i, 'Vals'] = means[df.loc[i].Cat]

    Cat   Vals
0   A     1
1   A     2
2   A     3
3   B     4
4   B     5
5   A     2
6   B   4.5

But I'm looking for something nicer, like

df.Vals.fillna(df.Vals.mean(Conditionally to column 'Cat'))

Edit: I found this, which is one line shorter, but I'm still not happy with it:

means = df.groupby('Cat').Vals.mean()
df.Vals = df.apply(lambda x: means[x.Cat] if pd.isnull(x.Vals) else x.Vals, axis=1)

回答1:


We wish to "associate" the Cat values with the missing NaN locations. In Pandas such associations are always done via the index. So it is natural to set Cat as the index:

df = df.set_index(['Cat'])

Once this is done, then fillna works as desired:

df['Vals'] = df['Vals'].fillna(means)

To return Cat to a column, you could then of course use reset_index:

df = df.reset_index()

import pandas as pd
import numpy as np
df = pd.DataFrame(
    {'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'], 
     'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]})

means = df.groupby(['Cat'])['Vals'].mean()
df = df.set_index(['Cat'])
df['Vals'] = df['Vals'].fillna(means)
df = df.reset_index()
print(df)

yields

  Cat  Vals
0   A   1.0
1   A   2.0
2   A   3.0
3   B   4.0
4   B   5.0
5   A   2.0
6   B   4.5


来源:https://stackoverflow.com/questions/33457191/python-pandas-dataframe-fill-nans-with-a-conditional-mean

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!