Grouping data by value ranges

后端未结

关注

 3  1061

再見小時候 2021-01-30 18:29

I have a csv file that shows parts on order. The columns include days late, qty and commodity.

I need to group the data by days late and commodity with a sum of the qty.

3条回答

佛祖请我去吃肉 (楼主)

2021-01-30 19:06

Suppose you start with this data:

df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
                   'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
                   'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
#    Days Late      ID  quantity
# 0         60  STRSUB        56
# 1         60  BOTDWG        20
# 2         50  STRSUB        60
# 3         50  BOTDWG        67
# 4         20  STRSUB        74
# 5         20  BOTDWG        87
# 6         10  STRSUB        40
# 7         10  BOTDWG        34

Then you can find the status category using pd.cut. Note that by default, pd.cut splits the Series df['Days Late'] into categories which are half-open intervals, (-1, 14], (14, 35], (35, 56], (56, 365]:

df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
print(df)
#        ID  quantity  status
# 0  STRSUB        56     Red
# 1  BOTDWG        20     Red
# 2  STRSUB        60   Amber
# 3  BOTDWG        67   Amber
# 4  STRSUB        74  Yellow
# 5  BOTDWG        87  Yellow
# 6  STRSUB        40   White
# 7  BOTDWG        34   White

Now use pivot to get the DataFrame in the desired form:

df = df.pivot(index='ID', columns='status', values='quantity')

and use reindex to obtain the desired order for the rows and columns:

df = df.reindex(columns=labels[::-1], index=df.index[::-1])

Thus,

import numpy as np
import pandas as pd

df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
                   'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
                   'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
df = df.pivot(index='ID', columns='status', values='quantity')
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
print(df)

yields

        Red  Amber  Yellow  White
ID                               
STRSUB   56     60      74     40
BOTDWG   20     67      87     34

0 讨论(0)

查看其它3个回答