Pandas percentage of total with groupby

前端 未结 14 2279
没有蜡笔的小新
没有蜡笔的小新 2020-11-22 06:41

This is obviously simple, but as a numpy newbe I\'m getting stuck.

I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office

14条回答
  •  长发绾君心
    2020-11-22 06:55

    Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:

    # From Paul H
    import numpy as np
    import pandas as pd
    np.random.seed(0)
    df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                       'office_id': list(range(1, 7)) * 2,
                       'sales': [np.random.randint(100000, 999999)
                                 for _ in range(12)]})
    state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
    # Change: groupby state_office and divide by sum
    state_pcts = state_office.groupby(level=0).apply(lambda x:
                                                     100 * x / float(x.sum()))
    

    Returns:

                         sales
    state office_id           
    AZ    2          16.981365
          4          19.250033
          6          63.768601
    CA    1          19.331879
          3          33.858747
          5          46.809373
    CO    1          36.851857
          3          19.874290
          5          43.273852
    WA    2          34.707233
          4          35.511259
          6          29.781508
    

提交回复
热议问题