Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?
df = pd.DataFrame({\'A\' : [\
pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)
Basically you just have the function that does row/row.sum(), and you use apply with axis=1 to apply it by row.
row/row.sum()
apply
axis=1
(If doing this in Python 2, you should use from __future__ import division to make sure division always returns a float.)
from __future__ import division