I have a data frame like:
customer spend hurdle
A 20 50
A 31 50
A 20 50
B 50 100
B
One way would be the below code. But it's a really inefficient and inelegant one-liner.
df1.groupby('customer').apply(lambda x: (x['spend'].cumsum() *(x['spend'].cumsum() > x['hurdle']).astype(int).shift(-1)).fillna(x['spend']))
There could be faster, efficient way. Here's one inefficient apply
way to do would be.
In [3270]: def custcum(x):
...: total = 0
...: for i, v in x.iterrows():
...: total += v.spend
...: x.loc[i, 'cum'] = total
...: if total >= v.hurdle:
...: total = 0
...: return x
...:
In [3271]: df.groupby('customer').apply(custcum)
Out[3271]:
customer spend hurdle cum
0 A 20 50 20.0
1 A 31 50 51.0
2 A 20 50 20.0
3 B 50 100 50.0
4 B 51 100 101.0
5 B 30 100 30.0
You may consider using cython
or numba
to speed up the custcum
[Update]
Improved version of Ido s answer.
In [3276]: s = df.groupby('customer').spend.cumsum()
In [3277]: np.where(s > df.hurdle.shift(-1), s, df.spend)
Out[3277]: array([ 20, 51, 20, 50, 101, 30], dtype=int64)