How to build a Python function with a rolling total?

人走茶凉 提交于 2021-02-04 07:38:32

问题


I am looking to build a function that creates a rolling total by code by day for the below DataFrames, where the In for each code on a date is subtracted from the Out for each code on a date, this subtotal is the subtracted from the previous days total but the total must be >=0 ( I have included an example of this it the Desired Output below).

Below are an example of my inputs and the function I am using along with an example of my desired output.

df1 - In

s = """        Date    Code    Quantity
0   10/01/2019  A   20
3   11/01/2019  A   2
7   12/01/2019  A   4
11  13/01/2019  A   10
"""

df2 - Out

s ='''    Date     Code   Quantity
0   11/01/2019  A   5
3   12/01/2019  A   100
4   15/01/2019  A   1
6   16/01/2019  A   2
'''

Code

df3 = df1.merge(df2, how='outer', left_on=['date', 'code'], right_on=['date', 'code']).fillna(0)
df3['qty1'] = df3['qty_x'] - df3['qty_y']
df3['qty'] = 0
def final_adder(x):
    x.qty_x = x.qty_x
    print(x)
    return x
df_final = df3.groupby(['code']).apply(final_adder)
df_final['qty'] = df_final['qty'].clip(lower=0)
df_final.drop(['qty_x', 'qty_y','qty1'], inplace=True, axis=1)


          date code  qty_x  qty_y qty1  qty
0   10/01/2019   A   20.0    0.0  20.0    0
3   11/01/2019   A    2.0    5.0  -3.0    0
7   12/01/2019   A    4.0  100.0 -96.0    0
11  13/01/2019   A   10.0    0.0  10.0    0

Desired Output

s = """        Date    Code    Quantity
0   10/01/2019  A   20
3   11/01/2019  A   17
7   12/01/2019  A   0
11  13/01/2019  A   10
12  14/01/2019  A   10
15  15/01/2019  A   9
16  16/01/2019  A   7
"""

回答1:


There is a whole sub-genre of questions on SO involving cumulative operations with limits (e.g.: "reset to zero when the cumulative sum becomes negative"). This is not the same as cumulative operation with known reset points (e.g. from another column, or where there are NaNs, etc.) because the condition involves the cumulative value itself.

There is no clean way in the current pandas or numpy to do that in a vectorized way.

The best (fastest) way I am aware of for this is this SO answer involving numba. Slightly modified and adapted to your problem:

from numba import njit
@njit
def poscumsum(x):
    total = 0
    result = np.empty(x.shape)
    for i, y in enumerate(x):
        total += y
        if total < 0:
            total = 0
        result[i] = total
    return result

With this, you can do:

a = df1.set_index(['Code', 'Date'])
b = df2.set_index(['Code', 'Date'])
idx = a.index.union(b.index).sort_values()
df3 = (a.reindex(idx, fill_value=0) - b.reindex(idx, fill_value=0))
# optional: resample Date to daily within each group:
df3 = df3.groupby('Code').resample('D', level='Date').sum()
df3['Quantity'] = df3.groupby('Code')['Quantity'].transform(
    lambda g: poscumsum(g.values))

On the data provided in the question:

>>> df3
                 Quantity
Code Date                
A    2019-01-10        20
     2019-01-11        17
     2019-01-12         0
     2019-01-13        10
     2019-01-14        10
     2019-01-15         9
     2019-01-16         7

If you prefer, you can also use merge. Here is an example where all the intermediary results are kept (for forensic analysis):

df3 = df1.merge(df2, on=['Code', 'Date'], how='outer', sort=True).fillna(0)
# optional: resample Date to daily within each group:
df3 = df3.set_index(['Code', 'Date']).groupby('Code').resample('D', level='Date').sum()
df3['diff'] = df3['Quantity_x'] - df3['Quantity_y']
df3['cumdiff'] = df3.groupby('Code')['diff'].transform(
   lambda g: poscumsum(g.values))

df3
# out:
                 Quantity_x  Quantity_y  diff  cumdiff
Code Date                                             
A    2019-01-10        20.0         0.0  20.0     20.0
     2019-01-11         2.0         5.0  -3.0     17.0
     2019-01-12         4.0       100.0 -96.0      0.0
     2019-01-13        10.0         0.0  10.0     10.0
     2019-01-14         0.0         0.0   0.0     10.0
     2019-01-15         0.0         1.0  -1.0      9.0
     2019-01-16         0.0         2.0  -2.0      7.0


来源:https://stackoverflow.com/questions/65411541/how-to-build-a-python-function-with-a-rolling-total

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!