Using Apply in Pandas Lambda functions with multiple if statements

后端 未结 4 1579
一生所求
一生所求 2021-02-09 12:08

I\'m trying to infer a classification according to the size of a person in a dataframe like this one:

      Size
1     80000
2     8000000
3     8000000000
...
<         


        
4条回答
  •  名媛妹妹
    2021-02-09 12:35

    Using Numpy's searchsorted

    labels = np.array(['<1m', '1-10m', '10-50m', '>50m'])
    bins = np.array([1E6, 1E7, 5E7])
    
    # Using assign is my preference as it produces a copy of df with new column
    df.assign(Classification=labels[bins.searchsorted(df['Size'].values)])
    

    If you wanted to produce new column in existing dataframe

    df['Classification'] = labels[bins.searchsorted(df['Size'].values)]
    

    Some Explanation

    From Docs:np.searchsorted

    Find indices where elements should be inserted to maintain order.

    Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.

    The labels array has a length greater than that of bins by one. Because when something is greater than the maximum value in bins, searchsorted returns a -1. When we slice labels this grabs the last label.

提交回复
热议问题