pandas get 30 day rolling window over n years

问题

I'm trying to grab a 30 day window going backwards from all dates in a dataframe but also look at the same 30 day window across all of the years in the dataset. The dates are from 2000-2019. For for example starting on 1st Feb 2000, I would like to grab the previous 30 days, and the 30 days before 1st Feb in all other years.

I can get a rolling window to work over n days for a z-score:

dt= pd.date_range(start='2000-01-01', end='2019-03-01')
x=[randint(0,100) for x in range(len(dt))]
DTX = pd.DataFrame({'X': x}, index=dt)

def zscore(x, window):
    """ calculate z-score across a window (assumes normal distribution) """
    r = x.rolling(window=window)
    m = r.mean().shift(1)
    s = r.std(ddof=0).shift(1)
    z = (x-m)/s
    return z

DTX['Z'] = zscore(DTX['X'], 30)

Or a rank for the window:

def ranked_percent(col, window):
    """ rank values in a window as a decimal (highest=1)"""
    pctrank = lambda x: pd.Series(x).rank(pct=True).iloc[-1]
    rollingrank=col.rolling(window=window,raw=False).apply(pctrank)
    return rollingrank

DTX['Rank'] = ranked_percent(DTX['X'], 30)

I was wondering about maybe using groupby and grouper but have no idea how to implement it? - Not wedded to this though, any (fairly vectorized/fast) python solution would help. I really need to extend this over all of the years in the dataset. I would appreciate any help?? Many thanks

来源：https://stackoverflow.com/questions/55104775/pandas-get-30-day-rolling-window-over-n-years

标签

python

pandas

time-series

pandas-groupby

rolling-computation