问题
Assume we have a pandas dataframe like this:
a b id
36 25 2
40 25 3
46 23 2
40 22 5
42 20 5
56 39 3
I would like to perform a operation (a div b), then group by id and finally calculate a weighted average, using "a" as weights. It work's when I only calculate the mean.
import pandas as pd
import numpy as np
df = pd.read_csv('file', sep='\s+')
a = (df['a'].div(df['b'])).groupby(df['id']).mean() # work fine
b = (df['a'].div(df['b'])).groupby(df['dd']).apply(lambda x: np.average(x ??? ), weights=x['a']))
Don't know how to parse the values of df['a'].div(df['b'] to the first parameter of the numpy average function. Any ideas?
Expected Output:
id Weighted Average
0 2 1.754146
1 3 1.504274
2 5 1.962528
回答1:
Are you looking to group the weighted average by id
?
df.groupby('id').apply(lambda x: np.average(x['b'],weights=x['a'])).reset_index(name='Weighted Average')
Out[1]:
id Weighted Average
0 2 23.878049
1 3 33.166667
2 5 20.975610
Or if you want to do the weighted average of a / b:
(df.groupby('id').apply(lambda x: np.average(x['a']/x['b'],weights=x['a']))
.reset_index(name='Weighted Average'))
Out[2]:
id Weighted Average
0 2 1.754146
1 3 1.504274
2 5 1.962528
来源:https://stackoverflow.com/questions/64236587/calculating-weighted-average-in-pandas-using-numpy-function