Applying weighted average function to column in pandas groupby object, but weights sum to zero

喜夏-厌秋 提交于 2021-02-05 08:21:11

问题


I am applying different functions to each column in a pandas groupby object. One of these functions is a weighted average, where the weights are the associated values in another column in the DataFrame. However, for a number of my groups the weights sum to zero. Because of this, I get a "Weights sum to zero, can't be normalized" error message when I run the code.

Referring to the code below, for the group defined by col1 value x and col2 value y, the sum of the values in col3 in rows with col1=x and col2=y is zero, creating an error in the weighted average of col5.

Is there any way to make it so that groups for which the weights sum to zero return a "weighted average" value of zero? Thanks!

df = pd.DataFrame([['x','x','x','y','y','y'],['a','a','b','b','c','c'],
                   [0,0,3,4,5,6],[1,1,1,1,1,1],[0,0,4,6,2,8]],
                   ).transpose()
df.columns = ['col1','col2','col3','col4','col5']

weighted_average = lambda x: np.average(x, weights=df.loc[x.index, 'col3'])
averages = df.groupby(['col1','col2']).agg({'col3':'sum', 
                               'col4':'sum', 
                               'col5': weighted_average})

回答1:


We can do the following:

  • Write our own function to check if there are 0 in col3 and col5. Else take the weighted average.
  • Concat the sum aggregation with out weighted average
def weighted_average(x):
    if (x.col3 > 0).all() & (x.col5 > 0).all():
        return np.average(x.col5, weights=x.col3)
    else:
        return 0


averages = df.groupby(['col1','col2']).agg({'col3':'sum', 
                                            'col4':'sum'})

weighted_avg = df.groupby(['col1','col2']).apply(weighted_average)

df_averages = pd.concat([averages, weighted_avg ], axis=1)\
                .reset_index()\
                .rename({0:'col5'}, axis=1)

Which yields:

print(df_averages)
           col3  col4      col5
col1 col2                      
x    a        0     2  0.000000
     b        3     1  4.000000
y    b        4     1  6.000000
     c       11     2  5.272727


来源:https://stackoverflow.com/questions/55650149/applying-weighted-average-function-to-column-in-pandas-groupby-object-but-weigh

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!