Pandas percentage of total with groupby

前端 未结 14 2286
没有蜡笔的小新
没有蜡笔的小新 2020-11-22 06:41

This is obviously simple, but as a numpy newbe I\'m getting stuck.

I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office

14条回答
  •  -上瘾入骨i
    2020-11-22 07:11

    Simple way I have used is a merge after the 2 groupby's then doing simple division.

    import numpy as np
    import pandas as pd
    np.random.seed(0)
    df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999) for _ in range(12)]})
    
    state_office = df.groupby(['state', 'office_id'])['sales'].sum().reset_index()
    state = df.groupby(['state'])['sales'].sum().reset_index()
    state_office = state_office.merge(state, left_on='state', right_on ='state', how = 'left')
    state_office['sales_ratio'] = 100*(state_office['sales_x']/state_office['sales_y'])
    
       state  office_id  sales_x  sales_y  sales_ratio
    0     AZ          2   222579  1310725    16.981365
    1     AZ          4   252315  1310725    19.250033
    2     AZ          6   835831  1310725    63.768601
    3     CA          1   405711  2098663    19.331879
    4     CA          3   710581  2098663    33.858747
    5     CA          5   982371  2098663    46.809373
    6     CO          1   404137  1096653    36.851857
    7     CO          3   217952  1096653    19.874290
    8     CO          5   474564  1096653    43.273852
    9     WA          2   535829  1543854    34.707233
    10    WA          4   548242  1543854    35.511259
    11    WA          6   459783  1543854    29.781508
    

提交回复
热议问题