问题
I am applying different functions to each column in a pandas groupby object. One of these functions is a weighted average, where the weights are the associated values in another column in the DataFrame. However, for a number of my groups the weights sum to zero. Because of this, I get a "Weights sum to zero, can't be normalized" error message when I run the code.
Referring to the code below, for the group defined by col1 value x and col2 value y, the sum of the values in col3 in rows with col1=x and col2=y is zero, creating an error in the weighted average of col5.
Is there any way to make it so that groups for which the weights sum to zero return a "weighted average" value of zero? Thanks!
df = pd.DataFrame([['x','x','x','y','y','y'],['a','a','b','b','c','c'],
[0,0,3,4,5,6],[1,1,1,1,1,1],[0,0,4,6,2,8]],
).transpose()
df.columns = ['col1','col2','col3','col4','col5']
weighted_average = lambda x: np.average(x, weights=df.loc[x.index, 'col3'])
averages = df.groupby(['col1','col2']).agg({'col3':'sum',
'col4':'sum',
'col5': weighted_average})
回答1:
We can do the following:
- Write our own function to check if there are
0
incol3
andcol5
. Else take the weighted average. - Concat the
sum
aggregation with out weighted average
def weighted_average(x):
if (x.col3 > 0).all() & (x.col5 > 0).all():
return np.average(x.col5, weights=x.col3)
else:
return 0
averages = df.groupby(['col1','col2']).agg({'col3':'sum',
'col4':'sum'})
weighted_avg = df.groupby(['col1','col2']).apply(weighted_average)
df_averages = pd.concat([averages, weighted_avg ], axis=1)\
.reset_index()\
.rename({0:'col5'}, axis=1)
Which yields:
print(df_averages)
col3 col4 col5
col1 col2
x a 0 2 0.000000
b 3 1 4.000000
y b 4 1 6.000000
c 11 2 5.272727
来源:https://stackoverflow.com/questions/55650149/applying-weighted-average-function-to-column-in-pandas-groupby-object-but-weigh