Python - rolling functions for GroupBy object

邮差的信 提交于 2019-11-28 18:12:32

Note: as identified by @kekert, the following pandas pattern has been deprecated. See current solutions in the answers below.

In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]: 
0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

In [17]: df.groupby('id')['x'].cumsum()
Out[17]: 
0     0
1     1
2     3
3     3
4     7
5    12

For the Googlers who come upon this old question:

Regarding @kekert's comment on @Garrett's answer to use the new

df.groupby('id')['x'].rolling(2).mean()

rather than the now-deprecated

df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)

curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.

So I think I've figured out a solution that uses the new rolling() method and still works the same:

df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

which should give you the series

0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

which you can add as a column:

df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

Here is another way that generalizes well and uses pandas' expanding method.

It is very efficient and also works perfectly for rolling window calculations with fixed windows, such as for time series.

# Import pandas library
import pandas as pd

# Prepare columns
x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']

# Create dataframe from columns above
df = pd.DataFrame({'id':id, 'x':x})

# Calculate rolling sum with infinite window size (i.e. all rows in group) using "expanding"
df['rolling_sum'] = df.groupby('id')['x'].transform(lambda x: x.expanding().sum())

# Output as desired by original poster
print(df)
  id  x  rolling_sum
0  a  0            0
1  a  1            1
2  a  2            3
3  b  3            3
4  b  4            7
5  b  5           12

I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.

I have tested it with cumprod, cummax and cummin and they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.

In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0     0
1     1
2     3
3     3
4     7
5    12

Edit: I found it curious that this syntax does return a Series:

In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!