I am creating a column with incremental values and then appending a string at the start of the column. When used on large data this is very slow. Please suggest a faster and eff
When all else fails, use a list comprehension:
df['NewColumn'] = ['str_%s' %i for i in range(1, len(df) + 1)]
Further speedups are possible if you cythonize your function:
%load_ext Cython
%%cython
def gen_list(l, h):
return ['str_%s' %i for i in range(l, h)]
Note, this code is run on Python3.6.0 (IPython6.2.1). Solution improved thanks to @hpaulj in the comments.
# @jezrael's fastest solution
%%timeit
df['NewColumn'] = np.arange(len(df['a'])) + 1
df['NewColumn'] = 'str_' + df['New_Column'].map(str)
547 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# in this post - no cython
%timeit df['NewColumn'] = ['str_%s'%i for i in range(n)]
409 ms ± 9.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# cythonized list comp
%timeit df['NewColumn'] = gen_list(1, len(df) + 1)
370 ms ± 9.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)