可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
How to get the exponential weighted moving average in numpy just like in pandas:
import pandas as pd import pandas_datareader as pdr from datetime import datetime #declare variables ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close'] windowSize = 20 #get PANDAS exponential weighted moving average ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix() print(ewm_pd)
tried the following with numpy
import numpy as np import pandas_datareader as pdr from datetime import datetime # From this post : http://stackoverflow.com/a/40085052/3293881 by @Divakar def strided_app(a, L, S): # Window len = L, Stride len/stepsize = S nrows = ((a.size - L) // S) + 1 n = a.strides[0] return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n)) def numpyEWMA(price, windowSize): weights = np.exp(np.linspace(-1., 0., windowSize)) weights /= weights.sum() a2D = strided_app(price, windowSize, 1) returnArray = np.empty((price.shape[0])) returnArray.fill(np.nan) for index in (range(a2D.shape[0])): returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1] return np.reshape(returnArray, (-1, 1)) #declare variables ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close'] windowSize = 20 #get NUMPY exponential weighted moving average ewma_np = numpyEWMA(ibm, windowSize) print(ewma_np)
but the results are not similar as the ones in pandas.
Is there maybe a better approach to calculate the exponential weighted moving average directly in numpy and get the exact same result as the pandas.ewm().mean()
?
At 60.000 requests on pandas solution, i get about 230 seconds. I am sure that with a pure numpy, this can be decreased significantly.
回答1:
Think I have finally cracked it!
Here's a vectorized version of numpy_ewma
function that's claimed to be producing the correct results from @RaduS's post
-
def numpy_ewma_vectorized(data, window): alpha = 2 /(window + 1.0) alpha_rev = 1-alpha n = data.shape[0] scale = 1/alpha_rev n = data.shape[0] r = np.arange(n) scale_arr = scale**r offset = data[0]*alpha_rev**(r+1) pw0 = alpha*alpha_rev**(n-1) mult = data*pw0*scale_arr cumsums = mult.cumsum() out = offset + cumsums*scale_arr[::-1] return out
Further boost
We can boost it further with some code re-use, like so -
def numpy_ewma_vectorized_v2(data, window): alpha = 2 /(window + 1.0) alpha_rev = 1-alpha n = data.shape[0] pows = alpha_rev**(np.arange(n+1)) scale_arr = 1/pows[:-1] offset = data[0]*pows[1:] pw0 = alpha*alpha_rev**(n-1) mult = data*pw0*scale_arr cumsums = mult.cumsum() out = offset + cumsums*scale_arr[::-1] return out
Runtime test
Let's time these two against the same loopy function for a big dataset.
Around 17x
speedup there!
回答2:
Here is an implementation using numpy that is equivalent to using df.ewm(alpha=alpha).mean()
. After reading the documentation, it is just a few matrix operations. The trick is constructing the right matrices.
It is worth noting that because we are creating float matrices, you can quickly eat through your memory if the input array is too large.
import pandas as pd import numpy as np def ewma(x, alpha): ''' Returns the exponentially weighted moving average of x. Parameters: ----------- x : array-like alpha : float {0
Let's test its:
alpha = 0.55 x = np.random.randint(0,30,15) df = pd.DataFrame(x, columns=['A']) df.ewm(alpha=alpha).mean() # returns: # A # 0 13.000000 # 1 22.655172 # 2 20.443268 # 3 12.159796 # 4 14.871955 # 5 15.497575 # 6 20.743511 # 7 20.884818 # 8 24.250715 # 9 18.610901 # 10 17.174686 # 11 16.528564 # 12 17.337879 # 13 7.801912 # 14 12.310889 ewma(x=x, alpha=alpha) # returns: # array([ 13. , 22.65517241, 20.44326778, 12.1597964 , # 14.87195534, 15.4975749 , 20.74351117, 20.88481763, # 24.25071484, 18.61090129, 17.17468551, 16.52856393, # 17.33787888, 7.80191235, 12.31088889])
回答3:
Given alpha
and windowSize
, here's an approach to simulate the corresponding behavior on NumPy -
def numpy_ewm_alpha(a, alpha, windowSize): wghts = (1-alpha)**np.arange(windowSize) wghts /= wghts.sum() out = np.full(df.shape[0],np.nan) out[windowSize-1:] = np.convolve(a,wghts,'valid') return out
Sample runs for verification -
In [54]: alpha = 0.55 ...: windowSize = 20 ...: In [55]: df = pd.DataFrame(np.random.randint(2,9,(100))) In [56]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel() ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize) ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1))) ...: Max. error : 5.10531254605e-07 In [57]: alpha = 0.75 ...: windowSize = 30 ...: In [58]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel() ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize) ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1))) Max. error : 8.881784197e-16
Runtime test on bigger dataset -
Further boost
For further performance boost we could avoid the initialization with NaNs and instead use the array outputted from np.convolve
, like so -
def numpy_ewm_alpha_v2(a, alpha, windowSize): wghts = (1-alpha)**np.arange(windowSize) wghts /= wghts.sum() out = np.convolve(a,wghts) out[:windowSize-1] = np.nan return out[:a.size]
Timings -
回答4:
@Divakar's answer seems to cause overflow when dealing with
numpy_ewma_vectorized(np.random.random(500000), 10)
What I have been using is:
def EMA(input, time_period=10): # For time period = 10 t_ = time_period - 1 ema = np.zeros_like(input,dtype=float) multiplier = 2.0 / (time_period + 1) #multiplier = 1 - multiplier for i in range(len(input)): # Special Case if i > t_: ema[i] = (input[i] - ema[i-1]) * multiplier + ema[i-1] else: ema[i] = np.mean(input[:i+1]) return ema
However, this is way slower than the panda solution:
from pandas import ewma as pd_ema def EMA_fast(X, time_period = 10): out = pd_ema(X, span=time_period, min_periods=time_period) out[:time_period-1] = np.cumsum(X[:time_period-1]) / np.asarray(range(1,time_period)) return out
回答5:
Here is another solution i came up with in the meantime, it is about 4 times faster than pandas solution.
def numpy_ewma(data, window): returnArray = np.empty((data.shape[0])) returnArray.fill(np.nan) e = data[0] alpha = 2 / float(window + 1) for s in range(data.shape[0]): e = ((data[s]-e) *alpha ) + e returnArray[s] = e return returnArray
I used this formula as a starting point. I am sure that this can be improved even more, but it is at least a starting point
回答6:
This answer may seem irrelevant. But, for those who also need to calculate the exponentially weighted variance (and also standard deviation) with numpy, the following solution will be useful:
import numpy as np def ew(a, alpha, winSize): _alpha = 1 - alpha ws = _alpha ** np.arange(winSize) w_sum = ws.sum() ew_mean = np.convolve(a, ws)[winSize - 1] bias = (w_sum ** 2) / ((w_sum ** 2) - (ws ** 2).sum()) ew_var = (np.convolve((a - ew_mean) ** 2, ws)[winSize - 1] / w_sum) * bias ew_std = np.sqrt(ew_var) return (ew_mean, ew_var, ew_std)