问题
I am trying to apply a rolling function, with a 3 year window, on a pandas dataframe.
import pandas as pd
# Dummy data
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
'IB': [2, 5, 8, 10, 7, 5, 10, 14],
'OB': [5, 8, 10, 12, 5, 10, 14, 20],
'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})
# The function to be applied
def get_ln_rate(ib, ob, delta):
n_years = len(ib)
return sum(delta)*np.log(ob[-1]/ib[0]) / (n_years * (ob[-1] - ib[0]))
The expected output is
Product Year IB OB Delta Ln_Rate
0 A 2015 2 5 2
1 A 2016 5 8 2
2 A 2017 8 10 1 0.3353
3 A 2018 10 12 3 0.2501
4 B 2015 7 5 -1
5 B 2016 5 10 3
6 B 2017 10 14 2 0.1320
7 B 2018 14 20 4 0.2773
I have tried
df['Ln_Rate'] = df.groupby('Product').rolling(3).apply(lambda x: get_ln_rate(x['IB'], x['OB'], x['Delta']))
But this does not work.
I have found several similar posts
applying custom rolling function to dataframe - this one does not have a clear answer
Pandas Rolling Apply custom - this one does not have multiple arguments
apply custom function on pandas dataframe on a rolling window - this one has rolling.apply
... but it doesn't show the syntax.
Neither seems to be spot on. Any pointers towards the correct syntax would be greatly appreciated.
回答1:
I solved this by reusing the rolling window.
import numpy as np
WINDOW_SIZE = 3
rw = df.groupby('Product').rolling(WINDOW_SIZE)
df = df.assign(delta_sum=rw['Delta'].agg(np.sum).reset_index()['Delta'],
ib_first=rw['IB'].apply(lambda xs: xs[0]).reset_index()['IB'],
ob_last=rw['OB'].apply(lambda xs: xs[-1]).reset_index()['OB'])
df['ln_rate'] = df['delta_sum']*np.log(df['ob_last']/df['ib_first']) / (WINDOW_SIZE * (df['ob_last'] - df['ib_first']))
Which yields:
Product Year IB OB Delta delta_sum ib_first ob_last ln_rate
0 A 2015 2 5 2 NaN NaN NaN NaN
1 A 2016 5 8 2 NaN NaN NaN NaN
2 A 2017 8 10 1 5.0 2.0 10.0 0.335300
3 A 2018 10 12 3 6.0 5.0 12.0 0.250134
4 B 2015 7 5 -1 NaN NaN NaN NaN
5 B 2016 5 10 3 NaN NaN NaN NaN
6 B 2017 10 14 2 4.0 7.0 14.0 0.132028
7 B 2018 14 20 4 9.0 5.0 20.0 0.277259
Resetting indices is necessary, to transform the grouped DataFrame back to its initial shape.
Hope that helps.
回答2:
Another answer came up my mind: Create rolling windows on the grouped indices, and pass partial dfs to your custom function. Of course, the function is not exactly called with multiple arguments, but nevertheless with all data needed.
import numpy as np
import pandas as pd
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
'IB': [2, 5, 8, 10, 7, 5, 10, 14],
'OB': [5, 8, 10, 12, 5, 10, 14, 20],
'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})
# The function to be applied
def get_ln_rate(df):
n_years = len(df['IB'])
return df['Delta'].sum() * np.log(df['OB'].iloc[-1] / df['IB'].iloc[0]) / (n_years * (df['OB'].iloc[-1] - df['IB'].iloc[0]))
ln_rate = df.groupby('Product').apply(lambda grp: pd.Series(grp.index).rolling(3).agg({'Ln_Rate': lambda window: get_ln_rate(grp.loc[window])})).reset_index()['Ln_Rate']
df.assign(Ln_Rate=ln_rate)
来源:https://stackoverflow.com/questions/59574934/apply-rolling-function-on-pandas-dataframe-with-multiple-arguments