Reset Cumulative sum base on condition Pandas

后端 未结 2 1216
挽巷
挽巷 2021-01-18 10:56

I have a data frame like:

customer spend hurdle 
A         20    50      
A         31    50      
A         20    50      
B         50    100     
B                


        
相关标签:
2条回答
  • 2021-01-18 11:15

    One way would be the below code. But it's a really inefficient and inelegant one-liner.

    df1.groupby('customer').apply(lambda x: (x['spend'].cumsum() *(x['spend'].cumsum() > x['hurdle']).astype(int).shift(-1)).fillna(x['spend']))
    
    0 讨论(0)
  • 2021-01-18 11:26

    There could be faster, efficient way. Here's one inefficient apply way to do would be.

    In [3270]: def custcum(x):
          ...:     total = 0
          ...:     for i, v in x.iterrows():
          ...:         total += v.spend
          ...:         x.loc[i, 'cum'] = total
          ...:         if total >= v.hurdle:
          ...:            total = 0
          ...:     return x
          ...:
    
    In [3271]: df.groupby('customer').apply(custcum)
    Out[3271]:
      customer  spend  hurdle    cum
    0        A     20      50   20.0
    1        A     31      50   51.0
    2        A     20      50   20.0
    3        B     50     100   50.0
    4        B     51     100  101.0
    5        B     30     100   30.0
    

    You may consider using cython or numba to speed up the custcum


    [Update]

    Improved version of Ido s answer.

    In [3276]: s = df.groupby('customer').spend.cumsum()
    
    In [3277]: np.where(s > df.hurdle.shift(-1), s, df.spend)
    Out[3277]: array([ 20,  51,  20,  50, 101,  30], dtype=int64)
    
    0 讨论(0)
提交回复
热议问题