In pandas crosstab, how to calculate weighted averages? And how to add row and column totals?

问题

I have a pandas dataframe with two categorical variables (in my example, city and colour), a column with percentages, and one with weights. I want to do a crosstab of city and colour, showing, for each combination of the two, the weighted average of perc.

I have managed to do it with the code below, where I first create a column with weights x perc, then one crosstab with the sum of (weights x perc), another crosstab with the sum of weights, then finally divide the first by the second.

It works, but is there a quicker/more elegant way to do it?

import pandas as pd
import numpy as np
np.random.seed(123)
df=pd.DataFrame()
myrows=10
df['weight'] = np.random.rand(myrows)*100

np.random.seed(321)
df['perc']=np.random.rand(myrows)
df['weight x perc']=df['weight']*df['perc']
df['colour']=np.where( df['perc']<0.5, 'red','yellow')

np.random.seed(555)
df['city']=np.where( np.random.rand(myrows) <0.5,'NY','LA' )


num=pd.crosstab( df['city'], df['colour'], values=df['weight x perc'], aggfunc='sum', margins=True)
den=pd.crosstab( df['city'], df['colour'], values=df['weight'], aggfunc='sum', margins=True)

out=num/den

print(out)

回答1:

Here using a groupby with apply() and using the numpy weighted average method.

df.groupby(['colour','city']).apply(lambda x: np.average(x.perc, weights=x.weight)).unstack(level=0)

which gives

colour       red    yellow
city                      
LA      0.173870  0.865636
NY      0.077912  0.687400

I don't have All on the margin though.

This will produce the totals

df.groupby(['colour']).apply(lambda x: np.average(x.perc, weights=x.weight))
df.groupby(['city']).apply(lambda x: np.average(x.perc, weights=x.weight))

Granted still not packaged into a single frame

来源：https://stackoverflow.com/questions/47059124/in-pandas-crosstab-how-to-calculate-weighted-averages-and-how-to-add-row-and-c

标签

python

pandas

crosstab

categorical-data