Pandas: make pivot table with percentage

前端 未结 3 1163
醉酒成梦
醉酒成梦 2020-12-10 21:18

I have dataframe

ID,url,used_at,active_seconds,domain
61a77f9e5fd52a50c10cd2d4d886ec68,mazdaspb.ru,2015-01,6,mazdaspb.ru
61a77f9e5fd52a50c10cd2d4d886ec68,maz         


        
相关标签:
3条回答
  • 2020-12-10 21:48

    IIUC you can use parameter margins for sum values in pivot_table and then divide all values last row All by div:

    group = pd.pivot_table(df, 
                           index='used_at', 
                           columns='domain', 
                           values='ID', 
                           aggfunc=len, 
                           margins=True)
    print (group)
    domain   avito.ru  mazdaspb.ru  vw-stat.ru   All
    used_at                                         
    2015-01       3.0          3.0         5.0  11.0
    All           3.0          3.0         5.0  11.0
    
    print (group.iloc[:-1])
    domain   avito.ru  mazdaspb.ru  vw-stat.ru   All
    used_at                                         
    2015-01       3.0          3.0         5.0  11.0
    
    print (group.iloc[-1])
    domain
    avito.ru        3.0
    mazdaspb.ru     3.0
    vw-stat.ru      5.0
    All            11.0
    Name: All, dtype: float64
    
    print (group.iloc[:-1].div(group.iloc[-1], axis=1) * 100)
    domain   avito.ru  mazdaspb.ru  vw-stat.ru    All
    used_at                                          
    2015-01     100.0        100.0       100.0  100.0
    

    Solution with divide by individual count with div and mul:

    group = pd.pivot_table(df, 
                           index='used_at',
                           columns='domain', 
                           values='ID', 
                           aggfunc=len)
              .div(len(df.index))
              .mul(100)
    print (group)
    
    domain    avito.ru  mazdaspb.ru  vw-stat.ru
    used_at                                    
    2015-01  27.272727    27.272727   45.454545
    
    0 讨论(0)
  • 2020-12-10 21:49

    Divide the individual count values obtained with the total number of rows of the DF to get it's percentage distribution as shown:

    func = lambda x: 100*x.count()/df.shape[0]
    pd.pivot_table(df, index='used_at', columns='domain', values='ID', aggfunc=func)
    

    0 讨论(0)
  • 2020-12-10 21:52

    An alternative approach is to use pd.crosstab, which has similar inputs to the pivotable.

    This includes a parameter normalize=False (default setting).

    You can change this to normalize=True and it provides the percentages of the total.

    0 讨论(0)
提交回复
热议问题