I have a dataset with weights for each observation and I want to prepare weighted summaries using groupby
but am rusty as to how to best do this. I think it imp
Simply multiply the two columns:
In [11]: df_city['weighted_jobs'] = df_city['weight'] * df_city['jobs']
Now you can groupby the city (and take the sum):
In [12]: df_city_sums = df_city.groupby('city').sum()
In [13]: df_city_sums
Out[13]:
jobs weight weighted_jobs
city
oakland 362 690 7958
san mateo 367 1017 9026
sf 253 638 6209
[3 rows x 3 columns]
Now you can divide the two sums, to get the desired result:
In [14]: df_city_sums['weighted_jobs'] / df_city_sums['jobs']
Out[14]:
city
oakland 21.983425
san mateo 24.594005
sf 24.541502
dtype: float64