Grouping data by value ranges

后端 未结 3 1061
再見小時候
再見小時候 2021-01-30 18:29

I have a csv file that shows parts on order. The columns include days late, qty and commodity.

I need to group the data by days late and commodity with a sum of the qty.

3条回答
  •  佛祖请我去吃肉
    2021-01-30 19:06

    Suppose you start with this data:

    df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
                       'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
                       'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
    #    Days Late      ID  quantity
    # 0         60  STRSUB        56
    # 1         60  BOTDWG        20
    # 2         50  STRSUB        60
    # 3         50  BOTDWG        67
    # 4         20  STRSUB        74
    # 5         20  BOTDWG        87
    # 6         10  STRSUB        40
    # 7         10  BOTDWG        34
    

    Then you can find the status category using pd.cut. Note that by default, pd.cut splits the Series df['Days Late'] into categories which are half-open intervals, (-1, 14], (14, 35], (35, 56], (56, 365]:

    df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
    labels = np.array('White Yellow Amber Red'.split())
    df['status'] = labels[df['status']]
    del df['Days Late']
    print(df)
    #        ID  quantity  status
    # 0  STRSUB        56     Red
    # 1  BOTDWG        20     Red
    # 2  STRSUB        60   Amber
    # 3  BOTDWG        67   Amber
    # 4  STRSUB        74  Yellow
    # 5  BOTDWG        87  Yellow
    # 6  STRSUB        40   White
    # 7  BOTDWG        34   White
    

    Now use pivot to get the DataFrame in the desired form:

    df = df.pivot(index='ID', columns='status', values='quantity')
    

    and use reindex to obtain the desired order for the rows and columns:

    df = df.reindex(columns=labels[::-1], index=df.index[::-1])
    

    Thus,

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
                       'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
                       'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
    df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
    labels = np.array('White Yellow Amber Red'.split())
    df['status'] = labels[df['status']]
    del df['Days Late']
    df = df.pivot(index='ID', columns='status', values='quantity')
    df = df.reindex(columns=labels[::-1], index=df.index[::-1])
    print(df)
    

    yields

            Red  Amber  Yellow  White
    ID                               
    STRSUB   56     60      74     40
    BOTDWG   20     67      87     34
    

提交回复
热议问题