I\'m trying to infer a classification according to the size of a person in a dataframe like this one:
Size
1 80000
2 8000000
3 8000000000
...
<
Using Numpy's searchsorted
labels = np.array(['<1m', '1-10m', '10-50m', '>50m'])
bins = np.array([1E6, 1E7, 5E7])
# Using assign is my preference as it produces a copy of df with new column
df.assign(Classification=labels[bins.searchsorted(df['Size'].values)])
If you wanted to produce new column in existing dataframe
df['Classification'] = labels[bins.searchsorted(df['Size'].values)]
Some Explanation
From Docs:np.searchsorted
Find indices where elements should be inserted to maintain order.
Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.
The labels
array has a length greater than that of bins
by one. Because when something is greater than the maximum value in bins
, searchsorted
returns a -1
. When we slice labels
this grabs the last label.