问题
In R you can compute a rolling mean with a specified window that can shift by a specified amount each time.
However maybe I just haven't found it anywhere but it doesn't seem like you can do it in pandas or some other python library?
Does anyone know of a way around this. I'll give you an example of what I mean:
Here we have half-monthly data, and I am computing the two month moving average, that shifts each month.
So in R I would do something like: two_month__movavg=rollapply(mydata,4,mean,by = 2,na.pad = FALSE)
Is there no equivalent in Python?
EDIT1:
DATE A DEMAND ... AA DEMAND A Price
0 2006/01/01 00:30:00 8013.27833 ... 5657.67500 20.03
1 2006/01/01 01:00:00 7726.89167 ... 5460.39500 18.66
2 2006/01/01 01:30:00 7372.85833 ... 5766.02500 20.38
3 2006/01/01 02:00:00 7071.83333 ... 5503.25167 18.59
4 2006/01/01 02:30:00 6865.44000 ... 5214.01500 17.53
回答1:
You can using rolling again, just need a little bit work with you assign index
Here by = 2
by = 2
df.loc[df.index[np.arange(len(df))%by==1],'New']=df.Price.rolling(window=4).mean()
df
Price New
0 63 NaN
1 92 NaN
2 92 NaN
3 5 63.00
4 90 NaN
5 3 47.50
6 81 NaN
7 98 68.00
8 100 NaN
9 58 84.25
10 38 NaN
11 15 52.75
12 75 NaN
13 19 36.75
回答2:
Now this is a bit of overkill for a 1D array of data, but you can simplify it and pull out what you need. Since pandas can rely on numpy, you might want to check to see how their rolling/strided function if implemented. Results for 20 sequential numbers. A 7 day window, striding/sliding by 2
z = np.arange(20)
z #array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
s = stride(z, (7,), (2,))
np.mean(s, axis=1) # array([ 3., 5., 7., 9., 11., 13., 15.])
Here is the code I use without the major portion of the documentation. It is derived from many implementations of strided function in numpy that can be found on this site. There are variants and incarnation, this is just another.
def stride(a, win=(3, 3), stepby=(1, 1)):
"""Provide a 2D sliding/moving view of an array.
There is no edge correction for outputs. Use the `pad_` function first."""
err = """Array shape, window and/or step size error.
Use win=(3,) with stepby=(1,) for 1D array
or win=(3,3) with stepby=(1,1) for 2D array
or win=(1,3,3) with stepby=(1,1,1) for 3D
---- a.ndim != len(win) != len(stepby) ----
"""
from numpy.lib.stride_tricks import as_strided
a_ndim = a.ndim
if isinstance(win, int):
win = (win,) * a_ndim
if isinstance(stepby, int):
stepby = (stepby,) * a_ndim
assert (a_ndim == len(win)) and (len(win) == len(stepby)), err
shp = np.array(a.shape) # array shape (r, c) or (d, r, c)
win_shp = np.array(win) # window (3, 3) or (1, 3, 3)
ss = np.array(stepby) # step by (1, 1) or (1, 1, 1)
newshape = tuple(((shp - win_shp) // ss) + 1) + tuple(win_shp)
newstrides = tuple(np.array(a.strides) * ss) + a.strides
a_s = as_strided(a, shape=newshape, strides=newstrides, subok=True).squeeze()
return a_s
I failed to point out that you can create an output that you could append as a column into pandas. Going back to the original definitions used above
nans = np.full_like(z, np.nan, dtype='float') # z is the 20 number sequence
means = np.mean(s, axis=1) # results from the strided mean
# assign the means to the output array skipping the first and last 3 and striding by 2
nans[3:-3:2] = means
nans # array([nan, nan, nan, 3., nan, 5., nan, 7., nan, 9., nan, 11., nan, 13., nan, 15., nan, nan, nan, nan])
回答3:
if the data size is not too large,here is an easy way:
by = 2
win = 4
start = 3 ## it's the index of your 1st valid value.
df.rolling(win).mean()[start::by] ## caculate all, choose what you need.
来源:https://stackoverflow.com/questions/54301042/is-there-no-option-for-step-size-in-pandas-dataframe-rolling-is-there-another-f