Using Apply in Pandas Lambda functions with multiple if statements

后端 未结 4 1578
一生所求
一生所求 2021-02-09 12:08

I\'m trying to infer a classification according to the size of a person in a dataframe like this one:

      Size
1     80000
2     8000000
3     8000000000
...
<         


        
相关标签:
4条回答
  • 2021-02-09 12:35

    Using Numpy's searchsorted

    labels = np.array(['<1m', '1-10m', '10-50m', '>50m'])
    bins = np.array([1E6, 1E7, 5E7])
    
    # Using assign is my preference as it produces a copy of df with new column
    df.assign(Classification=labels[bins.searchsorted(df['Size'].values)])
    

    If you wanted to produce new column in existing dataframe

    df['Classification'] = labels[bins.searchsorted(df['Size'].values)]
    

    Some Explanation

    From Docs:np.searchsorted

    Find indices where elements should be inserted to maintain order.

    Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.

    The labels array has a length greater than that of bins by one. Because when something is greater than the maximum value in bins, searchsorted returns a -1. When we slice labels this grabs the last label.

    0 讨论(0)
  • 2021-02-09 12:42

    Here is a small example that you can build upon:

    Basically, lambda x: x.. is the short one-liner of a function. What apply really asks for is a function which you can easily recreate yourself.

    import pandas as pd
    
    # Recreate the dataframe
    data = dict(Size=[80000,8000000,800000000])
    df = pd.DataFrame(data)
    
    # Create a function that returns desired values
    # You only need to check upper bound as the next elif-statement will catch the value
    def func(x):
        if x < 1e6:
            return "<1m"
        elif x < 1e7:
            return "1-10m"
        elif x < 5e7:
            return "10-50m"
        else:
            return 'N/A'
        # Add elif statements....
    
    df['Classification'] = df['Size'].apply(func)
    
    print(df)
    

    Returns:

            Size Classification
    0      80000            <1m
    1    8000000          1-10m
    2  800000000            N/A
    
    0 讨论(0)
  • 2021-02-09 12:44

    You can use pd.cut function:

    bins = [0, 1000000, 10000000, 50000000, ...]
    labels = ['<1m','1-10m','10-50m', ...]
    
    df['Classification'] = pd.cut(df['Size'], bins=bins, labels=labels)
    
    0 讨论(0)
  • 2021-02-09 12:49

    The apply lambda function actually does the job here, just interested what the problem was.... as your syntax looks ok and it works....

    df1= [80000, 8000000, 8000000000, 800000000000]
    df=pd.DataFrame(df1)
    df.columns=['size']
    df['Classification']=df['size'].apply(lambda x: '<1m' if x<1000000  else '1-10m' if 1000000<x<10000000 else '1bi')
    df
    

    Output:

    0 讨论(0)
提交回复
热议问题