Create new column with incremental values efficiently

前端 未结 4 1219
攒了一身酷
攒了一身酷 2021-02-07 03:23

I am creating a column with incremental values and then appending a string at the start of the column. When used on large data this is very slow. Please suggest a faster and eff

4条回答
  •  孤城傲影
    2021-02-07 03:46

    When all else fails, use a list comprehension:

    df['NewColumn'] = ['str_%s' %i for i in range(1, len(df) + 1)]
    

    Further speedups are possible if you cythonize your function:

    %load_ext Cython
    
    %%cython
    def gen_list(l, h):
        return ['str_%s' %i for i in range(l, h)]
    

    Note, this code is run on Python3.6.0 (IPython6.2.1). Solution improved thanks to @hpaulj in the comments.


    # @jezrael's fastest solution
    
    %%timeit
    df['NewColumn'] = np.arange(len(df['a'])) + 1
    df['NewColumn'] = 'str_' + df['New_Column'].map(str)
    
    547 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    # in this post - no cython
    
    %timeit df['NewColumn'] = ['str_%s'%i for i in range(n)]
    409 ms ± 9.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    # cythonized list comp 
    
    %timeit df['NewColumn'] = gen_list(1, len(df) + 1)
    370 ms ± 9.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

提交回复
热议问题