Pandas how to use pd.cut()

时光取名叫无心 2020-12-02 17:28

Here is the snippet:

test = pd.DataFrame({\'days\': [0,31,45]})
test[\'range\'] = pd.cut(test.days, [0,30,60])



  •  有刺的猬
    2020-12-02 18:08

    test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True)
    print (test)
       days           range
    0     0  (-0.001, 30.0]
    1    31    (30.0, 60.0]
    2    45    (30.0, 60.0]

    See difference:

    test = pd.DataFrame({'days': [0,20,30,31,45,60]})
    test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True)
    #30 value is in [30, 60) group
    test['range2'] = pd.cut(test.days, [0,30,60], right=False)
    #30 value is in (0, 30] group
    test['range3'] = pd.cut(test.days, [0,30,60])
    print (test)
       days          range1    range2    range3
    0     0  (-0.001, 30.0]   [0, 30)       NaN
    1    20  (-0.001, 30.0]   [0, 30)   (0, 30]
    2    30  (-0.001, 30.0]  [30, 60)   (0, 30]
    3    31    (30.0, 60.0]  [30, 60)  (30, 60]
    4    45    (30.0, 60.0]  [30, 60)  (30, 60]
    5    60    (30.0, 60.0]       NaN  (30, 60]

    Or use numpy.searchsorted, but values of days hast to be sorted:

    arr = np.array([0,30,60])
    test['range1'] = arr.searchsorted(test.days)
    test['range2'] = arr.searchsorted(test.days, side='right') - 1
    print (test)
       days  range1  range2
    0     0       0       0
    1    20       1       0
    2    30       1       1
    3    31       2       1
    4    45       2       1
    5    60       2       2
