Pandas monthly rolling operation

前端 未结 2 1183
醉梦人生
醉梦人生 2020-11-30 06:06

I ended up figuring it out while writing out this question so I\'ll just post anyway and answer my own question in case someone else needs a little help.

Problem

相关标签:
2条回答
  • 2020-11-30 06:28

    Use the "D" offset rather than "M" and specifically use "30D" for 30 days or approximately one month.

    df = df.rolling("30D").sum()
    

    Initially, I intuitively jumped to using "M" as I figured it stands for one month, but now it's clear why that doesn't work.

    0 讨论(0)
  • 2020-11-30 06:38

    To address why you cannot use things like "AS" or "Y", in this case, "Y" offset is not "a year", it is actually referencing YearEnd (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases), and therefore the rolling function does not get a fixed window (e.g. you get a 365 day window if your index falls on Jan 1, and 1 day if Dec 31).

    The proposed solution (offset by 30D) works if you do not need strict calendar months. Alternatively, you would iterate over your date index, and slice with an offset to get more precise control over your sum.

    If you have to do it in one line (separated for readability):

    df['Sum'] = [
        df.loc[
            edt - pd.tseries.offsets.DateOffset(months=1):edt, 'spendings'
        ].sum() for edt in df.index
    ]
    spendings   category    Sum
    date            
    2014-03-25  10  A   10
    2014-04-05  20  A   30
    2014-04-15  10  A   40
    2014-04-25  10  B   50
    2014-05-05  10  B   50
    2014-05-15  10  A   40
    2014-05-25  10  A   40
    
    0 讨论(0)
提交回复
热议问题