how to compute a new column based on the values of other columns in pandas - python

前端 未结 4 1479
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-13 01:17

Let\'s say my data frame contains these data:

>>> df = pd.DataFrame({\'a\':[\'l1\',\'l2\',\'l1\',\'l2\',\'l1\',\'l2\'],
                       \'b\'         


        
相关标签:
4条回答
  • 2021-01-13 01:56

    You can also use the string methods.

    df['c'] = (df.a.str[-1] == df.b).astype(int)
    
    0 讨论(0)
  • 2021-01-13 02:02

    df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

    0 讨论(0)
  • 2021-01-13 02:02

    You can just use logical operators. I'm not sure why you're using strings of 1 and 2 rather than ints, but here's a solution. The astype at the end converts it from boolean to 0's and 1's.

    df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)

    0 讨论(0)
  • 2021-01-13 02:14
    df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000),
                       'b': numpy.random.choice(['1', '2'], 1000000)})
    

    A fast solution assuming only two distinct values:

    %timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int)
    

    10 loops, best of 3: 178 ms per loop

    @Viktor Kerkes:

    %timeit df['c'] = (df.a.str[-1] == df.b).astype(int)
    

    1 loops, best of 3: 412 ms per loop

    @user1470788:

    %timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)
    

    1 loops, best of 3: 363 ms per loop

    @herrfz

    %timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)
    

    1 loops, best of 3: 387 ms per loop

    0 讨论(0)
提交回复
热议问题