vectorize conditional assignment in pandas dataframe

前端 未结 2 857
盖世英雄少女心
盖世英雄少女心 2020-12-08 10:24

If I have a dataframe df with column x and want to create column y based on values of x using this in pseudo code:

<
相关标签:
2条回答
  • 2020-12-08 10:48

    This is a good use case for pd.cut where you define ranges and based on those ranges you can assign labels:

    df['y'] = pd.cut(df['x'], [-np.inf, -2, 2, np.inf], labels=[1, 0, -1], right=False)
    

    Output

       x  y
    0  0  0
    1 -3  1
    2  5 -1
    3 -1  0
    4  1  0
    
    0 讨论(0)
  • 2020-12-08 10:59

    One simple method would be to assign the default value first and then perform 2 loc calls:

    In [66]:
    
    df = pd.DataFrame({'x':[0,-3,5,-1,1]})
    df
    Out[66]:
       x
    0  0
    1 -3
    2  5
    3 -1
    4  1
    
    In [69]:
    
    df['y'] = 0
    df.loc[df['x'] < -2, 'y'] = 1
    df.loc[df['x'] > 2, 'y'] = -1
    df
    Out[69]:
       x  y
    0  0  0
    1 -3  1
    2  5 -1
    3 -1  0
    4  1  0
    

    If you wanted to use np.where then you could do it with a nested np.where:

    In [77]:
    
    df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))
    df
    Out[77]:
       x  y
    0  0  0
    1 -3  1
    2  5 -1
    3 -1  0
    4  1  0
    

    So here we define the first condition as where x is less than -2, return 1, then we have another np.where which tests the other condition where x is greater than 2 and returns -1, otherwise return 0

    timings

    In [79]:
    
    %timeit df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))
    
    1000 loops, best of 3: 1.79 ms per loop
    
    In [81]:
    
    %%timeit
    df['y'] = 0
    df.loc[df['x'] < -2, 'y'] = 1
    df.loc[df['x'] > 2, 'y'] = -1
    
    100 loops, best of 3: 3.27 ms per loop
    

    So for this sample dataset the np.where method is twice as fast

    0 讨论(0)
提交回复
热议问题