Create new column with incremental values efficiently

前端未结

关注

 4  1219

攒了一身酷 2021-02-07 03:23

I am creating a column with incremental values and then appending a string at the start of the column. When used on large data this is very slow. Please suggest a faster and eff

4条回答

孤城傲影 (楼主)

2021-02-07 03:46

When all else fails, use a list comprehension:

df['NewColumn'] = ['str_%s' %i for i in range(1, len(df) + 1)]

Further speedups are possible if you cythonize your function:

%load_ext Cython

%%cython
def gen_list(l, h):
    return ['str_%s' %i for i in range(l, h)]

Note, this code is run on Python3.6.0 (IPython6.2.1). Solution improved thanks to @hpaulj in the comments.

# @jezrael's fastest solution

%%timeit
df['NewColumn'] = np.arange(len(df['a'])) + 1
df['NewColumn'] = 'str_' + df['New_Column'].map(str)

547 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# in this post - no cython

%timeit df['NewColumn'] = ['str_%s'%i for i in range(n)]
409 ms ± 9.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# cythonized list comp 

%timeit df['NewColumn'] = gen_list(1, len(df) + 1)
370 ms ± 9.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

0 讨论(0)

查看其它4个回答