how to compute a new column based on the values of other columns in pandas - python

前端 未结 4 1478
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-13 01:17

Let\'s say my data frame contains these data:

>>> df = pd.DataFrame({\'a\':[\'l1\',\'l2\',\'l1\',\'l2\',\'l1\',\'l2\'],
                       \'b\'         


        
4条回答
  •  心在旅途
    2021-01-13 02:14

    df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000),
                       'b': numpy.random.choice(['1', '2'], 1000000)})
    

    A fast solution assuming only two distinct values:

    %timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int)
    

    10 loops, best of 3: 178 ms per loop

    @Viktor Kerkes:

    %timeit df['c'] = (df.a.str[-1] == df.b).astype(int)
    

    1 loops, best of 3: 412 ms per loop

    @user1470788:

    %timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)
    

    1 loops, best of 3: 363 ms per loop

    @herrfz

    %timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)
    

    1 loops, best of 3: 387 ms per loop

提交回复
热议问题