Give such a data frame df
:
id_ val
11111 12
12003 22
88763 19
43721 77
...
I wish to add a column
Recursive functions are not easily vectorisable. However, you can optimize your algorithm with numba
. This should be preferable to a regular loop.
from numba import jit
@jit(nopython=True)
def foo(val):
diff = np.zeros(val.shape)
diff[0] = val[0] * 0.4
for i in range(1, diff.shape[0]):
diff[i] = (val[i] - diff[i-1]) * 0.4 + diff[i-1]
return diff
df['diff'] = foo(df['val'].values)
print(df)
id_ val diff
0 11111 12 4.8000
1 12003 22 11.6800
2 88763 19 14.6080
3 43721 77 39.5648
You can use:
df.loc[0, 'diff'] = df.loc[0, 'val'] * 0.4
for i in range(1, len(df)):
df.loc[i, 'diff'] = (df.loc[i, 'val'] - df.loc[i-1, 'diff']) * 0.4 + df.loc[i-1, 'diff']
print (df)
id_ val diff
0 11111 12 4.8000
1 12003 22 11.6800
2 88763 19 14.6080
3 43721 77 39.5648
The iterative nature of the calculation where the inputs depend on results of previous steps complicates vectorization. You could perhaps use apply with a function that does the same calculation as the loop, but behind the scenes this would also be a loop.
if you are using apply in pandas, you should not be using the dataframe again within the lambda function.
your object in all cases within the lambda function should be 'row'.