I have a dataframe which looks like this:
pd.DataFrame({\'category\': [1,1,1,2,2,2,3,3,3,4],
\'order_sta
Inspired by my answer here, one can define a function first:
def mean_previous(df, Category, Order, Var):
# Order the dataframe first
df.sort_values([Category, Order], inplace=True)
# Calculate the ordinary grouped cumulative sum
# and then substract with the grouped cumulative sum of the last order
csp = df.groupby(Category)[Var].cumsum() - df.groupby([Category, Order])[Var].cumsum()
# Calculate the ordinary grouped cumulative count
# and then substract with the grouped cumulative count of the last order
ccp = df.groupby(Category)[Var].cumcount() - df.groupby([Category, Order]).cumcount()
return csp / ccp
And the desired column is
df['mean'] = mean_previous(df, 'category', 'order_start', 'time')
Performance-wise, I believe it's very fast.