Creating a new column based on the values of other columns

后端 未结 4 1142
孤街浪徒
孤街浪徒 2021-01-16 10:19

I wanted to create a \"High Value Indicator\" column, which says \"Y\" or \"N\" based on two different value columns. I want the new column to have a \"Y\" when Value_1 is >

相关标签:
4条回答
  • 2021-01-16 10:54

    You can also use apply:

    df['High Value Indicator'] = (
         df.apply(lambda x: 'Y' if (x.Value_1>1000 or x.Value_2>15000) else 'N', axis=1)
         )
    
    0 讨论(0)
  • 2021-01-16 10:57

    Use numpy.where with chained conditions by | for or:

    df['High Value Indicator'] = np.where((df.Value_1 > 1000) | (df.Value_2 > 15000), 'Y', 'N')
    

    Or map by dictionary:

    df['High Value Indicator'] = ((df.Value_1 > 1000) | (df.Value_2 > 15000))
                                     .map({True:'Y', False:'N'})
    
    print (df)
       ID  Value_1  Value_2 High Value Indicator
    0   1      100     2500                    N
    1   2      250     6250                    N
    2   3      625    15625                    Y
    3   4     1500    37500                    Y
    4   5     3750    93750                    Y
    

    Timings:

    df = pd.concat([df] * 10000, ignore_index=True)
    

    In [76]: %timeit df['High Value Indicator1'] = np.where((df.Value_1 > 1000) | (df.Value_2 > 15000), 'Y', 'N')
    100 loops, best of 3: 4.03 ms per loop
    
    In [77]: %timeit df['High Value Indicator2'] = ((df.Value_1 > 1000) | (df.Value_2 > 15000)).map({True:'Y', False:'N'})
    100 loops, best of 3: 4.82 ms per loop
    
    In [78]: %%timeit
        ...: df.loc[((df['Value_1'] > 1000) 
        ...:        |(df['Value_2'] > 15000)), 'High_Value_Ind3'] = 'Y'
        ...: 
        ...: df['High_Value_Ind3'] = df['High_Value_Ind3'].fillna('N')
        ...: 
    100 loops, best of 3: 5.28 ms per loop
    
    
    In [79]: %timeit df['High Value Indicator'] = (df.apply(lambda x: 'Y' if (x.Value_1>1000 or x.Value_2>15000) else 'N', axis=1))
    1 loop, best of 3: 1.72 s per loop
    
    0 讨论(0)
  • 2021-01-16 10:58

    Try using .loc and .fillna

    df.loc[((df['Value_1'] > 1000) 
           |(df['Value_2'] > 15000)), 'High_Value_Ind'] = 'Y'
    
    df['High_Value_Ind'] = df['High_Value_Ind'].fillna('N')
    
    0 讨论(0)
  • 2021-01-16 11:02

    Using map

    df['High Value Indicator'] =((df.Value_1 > 1000) | (df.Value_2 > 15000)).map({True:'Y',False:'N'})
    df
    Out[849]: 
       ID  Value_1  Value_2 High Value Indicator
    0   1      100     2500                    N
    1   2      250     6250                    N
    2   3      625    15625                    Y
    3   4     1500    37500                    Y
    4   5     3750    93750                    Y
    
    0 讨论(0)
提交回复
热议问题