I have dataframe
ID,url,used_at,active_seconds,domain
61a77f9e5fd52a50c10cd2d4d886ec68,mazdaspb.ru,2015-01,6,mazdaspb.ru
61a77f9e5fd52a50c10cd2d4d886ec68,maz
IIUC you can use parameter margins
for sum values in pivot_table and then divide all values last row All
by div:
group = pd.pivot_table(df,
index='used_at',
columns='domain',
values='ID',
aggfunc=len,
margins=True)
print (group)
domain avito.ru mazdaspb.ru vw-stat.ru All
used_at
2015-01 3.0 3.0 5.0 11.0
All 3.0 3.0 5.0 11.0
print (group.iloc[:-1])
domain avito.ru mazdaspb.ru vw-stat.ru All
used_at
2015-01 3.0 3.0 5.0 11.0
print (group.iloc[-1])
domain
avito.ru 3.0
mazdaspb.ru 3.0
vw-stat.ru 5.0
All 11.0
Name: All, dtype: float64
print (group.iloc[:-1].div(group.iloc[-1], axis=1) * 100)
domain avito.ru mazdaspb.ru vw-stat.ru All
used_at
2015-01 100.0 100.0 100.0 100.0
Solution with divide by individual count with div and mul:
group = pd.pivot_table(df,
index='used_at',
columns='domain',
values='ID',
aggfunc=len)
.div(len(df.index))
.mul(100)
print (group)
domain avito.ru mazdaspb.ru vw-stat.ru
used_at
2015-01 27.272727 27.272727 45.454545
Divide the individual count values obtained with the total number of rows of the DF
to get it's percentage distribution as shown:
func = lambda x: 100*x.count()/df.shape[0]
pd.pivot_table(df, index='used_at', columns='domain', values='ID', aggfunc=func)
An alternative approach is to use pd.crosstab, which has similar inputs to the pivotable.
This includes a parameter normalize=False (default setting).
You can change this to normalize=True and it provides the percentages of the total.