Numpy version of “Exponential weighted moving average”, equivalent to pandas.ewm().mean()

匿名 (未验证) 提交于 2019-12-03 01:27:01

问题:

How to get the exponential weighted moving average in numpy just like in pandas:

import pandas as pd     import pandas_datareader as pdr from datetime import datetime  #declare variables ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close'] windowSize = 20  #get PANDAS exponential weighted moving average ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix()  print(ewm_pd) 

tried the following with numpy

import numpy as np import pandas_datareader as pdr from datetime import datetime  # From this post : http://stackoverflow.com/a/40085052/3293881 by @Divakar def strided_app(a, L, S):  # Window len = L, Stride len/stepsize = S     nrows = ((a.size - L) // S) + 1     n = a.strides[0]     return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n))  def numpyEWMA(price, windowSize):     weights = np.exp(np.linspace(-1., 0., windowSize))     weights /= weights.sum()      a2D = strided_app(price, windowSize, 1)      returnArray = np.empty((price.shape[0]))     returnArray.fill(np.nan)     for index in (range(a2D.shape[0])):         returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1]     return np.reshape(returnArray, (-1, 1))  #declare variables ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close'] windowSize = 20  #get NUMPY exponential weighted moving average ewma_np = numpyEWMA(ibm, windowSize)  print(ewma_np) 

but the results are not similar as the ones in pandas.

Is there maybe a better approach to calculate the exponential weighted moving average directly in numpy and get the exact same result as the pandas.ewm().mean()?

At 60.000 requests on pandas solution, i get about 230 seconds. I am sure that with a pure numpy, this can be decreased significantly.

回答1:

Think I have finally cracked it!

Here's a vectorized version of numpy_ewma function that's claimed to be producing the correct results from @RaduS's post -

def numpy_ewma_vectorized(data, window):      alpha = 2 /(window + 1.0)     alpha_rev = 1-alpha     n = data.shape[0]      scale = 1/alpha_rev     n = data.shape[0]      r = np.arange(n)         scale_arr = scale**r     offset = data[0]*alpha_rev**(r+1)     pw0 = alpha*alpha_rev**(n-1)      mult = data*pw0*scale_arr     cumsums = mult.cumsum()     out = offset + cumsums*scale_arr[::-1]     return out 

Further boost

We can boost it further with some code re-use, like so -

def numpy_ewma_vectorized_v2(data, window):      alpha = 2 /(window + 1.0)     alpha_rev = 1-alpha     n = data.shape[0]      pows = alpha_rev**(np.arange(n+1))      scale_arr = 1/pows[:-1]     offset = data[0]*pows[1:]     pw0 = alpha*alpha_rev**(n-1)      mult = data*pw0*scale_arr     cumsums = mult.cumsum()     out = offset + cumsums*scale_arr[::-1]     return out 

Runtime test

Let's time these two against the same loopy function for a big dataset.

Around 17x speedup there!



回答2:

Here is an implementation using numpy that is equivalent to using df.ewm(alpha=alpha).mean(). After reading the documentation, it is just a few matrix operations. The trick is constructing the right matrices.

It is worth noting that because we are creating float matrices, you can quickly eat through your memory if the input array is too large.

import pandas as pd import numpy as np  def ewma(x, alpha):     '''     Returns the exponentially weighted moving average of x.      Parameters:     -----------     x : array-like     alpha : float {0 

Let's test its:

alpha = 0.55 x = np.random.randint(0,30,15) df = pd.DataFrame(x, columns=['A']) df.ewm(alpha=alpha).mean()  # returns: #             A # 0   13.000000 # 1   22.655172 # 2   20.443268 # 3   12.159796 # 4   14.871955 # 5   15.497575 # 6   20.743511 # 7   20.884818 # 8   24.250715 # 9   18.610901 # 10  17.174686 # 11  16.528564 # 12  17.337879 # 13   7.801912 # 14  12.310889  ewma(x=x, alpha=alpha)  # returns: # array([ 13.        ,  22.65517241,  20.44326778,  12.1597964 , #        14.87195534,  15.4975749 ,  20.74351117,  20.88481763, #        24.25071484,  18.61090129,  17.17468551,  16.52856393, #        17.33787888,   7.80191235,  12.31088889]) 


回答3:

Given alpha and windowSize, here's an approach to simulate the corresponding behavior on NumPy -

def numpy_ewm_alpha(a, alpha, windowSize):     wghts = (1-alpha)**np.arange(windowSize)     wghts /= wghts.sum()     out = np.full(df.shape[0],np.nan)     out[windowSize-1:] = np.convolve(a,wghts,'valid')     return out 

Sample runs for verification -

In [54]: alpha = 0.55     ...: windowSize = 20     ...:   In [55]: df = pd.DataFrame(np.random.randint(2,9,(100)))  In [56]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()     ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)     ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))     ...:  Max. error : 5.10531254605e-07  In [57]: alpha = 0.75     ...: windowSize = 30     ...:   In [58]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()     ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)     ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))  Max. error : 8.881784197e-16 

Runtime test on bigger dataset -


Further boost

For further performance boost we could avoid the initialization with NaNs and instead use the array outputted from np.convolve, like so -

def numpy_ewm_alpha_v2(a, alpha, windowSize):     wghts = (1-alpha)**np.arange(windowSize)     wghts /= wghts.sum()     out = np.convolve(a,wghts)     out[:windowSize-1] = np.nan     return out[:a.size]   

Timings -



回答4:

@Divakar's answer seems to cause overflow when dealing with

numpy_ewma_vectorized(np.random.random(500000), 10) 

What I have been using is:

def EMA(input, time_period=10): # For time period = 10     t_ = time_period - 1     ema = np.zeros_like(input,dtype=float)     multiplier = 2.0 / (time_period + 1)     #multiplier = 1 - multiplier     for i in range(len(input)):         # Special Case         if i > t_:             ema[i] = (input[i] - ema[i-1]) * multiplier + ema[i-1]         else:             ema[i] = np.mean(input[:i+1])     return ema 

However, this is way slower than the panda solution:

from pandas import ewma as pd_ema def EMA_fast(X, time_period = 10):     out = pd_ema(X, span=time_period, min_periods=time_period)     out[:time_period-1] = np.cumsum(X[:time_period-1]) / np.asarray(range(1,time_period))     return out 


回答5:

Here is another solution i came up with in the meantime, it is about 4 times faster than pandas solution.

def numpy_ewma(data, window):     returnArray = np.empty((data.shape[0]))     returnArray.fill(np.nan)     e = data[0]     alpha = 2 / float(window + 1)     for s in range(data.shape[0]):         e =  ((data[s]-e) *alpha ) + e         returnArray[s] = e     return returnArray 

I used this formula as a starting point. I am sure that this can be improved even more, but it is at least a starting point



回答6:

This answer may seem irrelevant. But, for those who also need to calculate the exponentially weighted variance (and also standard deviation) with numpy, the following solution will be useful:

import numpy as np  def ew(a, alpha, winSize):     _alpha = 1 - alpha     ws = _alpha ** np.arange(winSize)     w_sum = ws.sum()     ew_mean = np.convolve(a, ws)[winSize - 1]     bias = (w_sum ** 2) / ((w_sum ** 2) - (ws ** 2).sum())     ew_var = (np.convolve((a - ew_mean) ** 2, ws)[winSize - 1] / w_sum) * bias     ew_std = np.sqrt(ew_var)     return (ew_mean, ew_var, ew_std) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!