可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

How to get the exponential weighted moving average in numpy just like in pandas:

import pandas as pd     import pandas_datareader as pdr from datetime import datetime  #declare variables ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close'] windowSize = 20  #get PANDAS exponential weighted moving average ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix()  print(ewm_pd)

tried the following with numpy

import numpy as np import pandas_datareader as pdr from datetime import datetime  # From this post : http://stackoverflow.com/a/40085052/3293881 by @Divakar def strided_app(a, L, S):  # Window len = L, Stride len/stepsize = S     nrows = ((a.size - L) // S) + 1     n = a.strides[0]     return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n))  def numpyEWMA(price, windowSize):     weights = np.exp(np.linspace(-1., 0., windowSize))     weights /= weights.sum()      a2D = strided_app(price, windowSize, 1)      returnArray = np.empty((price.shape[0]))     returnArray.fill(np.nan)     for index in (range(a2D.shape[0])):         returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1]     return np.reshape(returnArray, (-1, 1))  #declare variables ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close'] windowSize = 20  #get NUMPY exponential weighted moving average ewma_np = numpyEWMA(ibm, windowSize)  print(ewma_np)

but the results are not similar as the ones in pandas.

Is there maybe a better approach to calculate the exponential weighted moving average directly in numpy and get the exact same result as the pandas.ewm().mean()?

At 60.000 requests on pandas solution, i get about 230 seconds. I am sure that with a pure numpy, this can be decreased significantly.

回答1:

Think I have finally cracked it!

Here's a vectorized version of numpy_ewma function that's claimed to be producing the correct results from @RaduS's post -

def numpy_ewma_vectorized(data, window):      alpha = 2 /(window + 1.0)     alpha_rev = 1-alpha     n = data.shape[0]      scale = 1/alpha_rev     n = data.shape[0]      r = np.arange(n)         scale_arr = scale**r     offset = data[0]*alpha_rev**(r+1)     pw0 = alpha*alpha_rev**(n-1)      mult = data*pw0*scale_arr     cumsums = mult.cumsum()     out = offset + cumsums*scale_arr[::-1]     return out

Further boost

We can boost it further with some code re-use, like so -

def numpy_ewma_vectorized_v2(data, window):      alpha = 2 /(window + 1.0)     alpha_rev = 1-alpha     n = data.shape[0]      pows = alpha_rev**(np.arange(n+1))      scale_arr = 1/pows[:-1]     offset = data[0]*pows[1:]     pw0 = alpha*alpha_rev**(n-1)      mult = data*pw0*scale_arr     cumsums = mult.cumsum()     out = offset + cumsums*scale_arr[::-1]     return out

Runtime test

Let's time these two against the same loopy function for a big dataset.

Around 17x speedup there!

回答2:

Here is an implementation using numpy that is equivalent to using df.ewm(alpha=alpha).mean(). After reading the documentation, it is just a few matrix operations. The trick is constructing the right matrices.

It is worth noting that because we are creating float matrices, you can quickly eat through your memory if the input array is too large.

import pandas as pd import numpy as np  def ewma(x, alpha):     '''     Returns the exponentially weighted moving average of x.      Parameters:     -----------     x : array-like     alpha : float {0

Let's test its:

alpha = 0.55 x = np.random.randint(0,30,15) df = pd.DataFrame(x, columns=['A']) df.ewm(alpha=alpha).mean()  # returns: #             A # 0   13.000000 # 1   22.655172 # 2   20.443268 # 3   12.159796 # 4   14.871955 # 5   15.497575 # 6   20.743511 # 7   20.884818 # 8   24.250715 # 9   18.610901 # 10  17.174686 # 11  16.528564 # 12  17.337879 # 13   7.801912 # 14  12.310889  ewma(x=x, alpha=alpha)  # returns: # array([ 13.        ,  22.65517241,  20.44326778,  12.1597964 , #        14.87195534,  15.4975749 ,  20.74351117,  20.88481763, #        24.25071484,  18.61090129,  17.17468551,  16.52856393, #        17.33787888,   7.80191235,  12.31088889])

回答3:

Given alpha and windowSize, here's an approach to simulate the corresponding behavior on NumPy -

def numpy_ewm_alpha(a, alpha, windowSize):     wghts = (1-alpha)**np.arange(windowSize)     wghts /= wghts.sum()     out = np.full(df.shape[0],np.nan)     out[windowSize-1:] = np.convolve(a,wghts,'valid')     return out

Sample runs for verification -

In [54]: alpha = 0.55     ...: windowSize = 20     ...:   In [55]: df = pd.DataFrame(np.random.randint(2,9,(100)))  In [56]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()     ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)     ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))     ...:  Max. error : 5.10531254605e-07  In [57]: alpha = 0.75     ...: windowSize = 30     ...:   In [58]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()     ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)     ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))  Max. error : 8.881784197e-16

Runtime test on bigger dataset -

Further boost

For further performance boost we could avoid the initialization with NaNs and instead use the array outputted from np.convolve, like so -

def numpy_ewm_alpha_v2(a, alpha, windowSize):     wghts = (1-alpha)**np.arange(windowSize)     wghts /= wghts.sum()     out = np.convolve(a,wghts)     out[:windowSize-1] = np.nan     return out[:a.size]

Timings -

回答4:

@Divakar's answer seems to cause overflow when dealing with

numpy_ewma_vectorized(np.random.random(500000), 10)

What I have been using is:

def EMA(input, time_period=10): # For time period = 10     t_ = time_period - 1     ema = np.zeros_like(input,dtype=float)     multiplier = 2.0 / (time_period + 1)     #multiplier = 1 - multiplier     for i in range(len(input)):         # Special Case         if i > t_:             ema[i] = (input[i] - ema[i-1]) * multiplier + ema[i-1]         else:             ema[i] = np.mean(input[:i+1])     return ema

However, this is way slower than the panda solution:

from pandas import ewma as pd_ema def EMA_fast(X, time_period = 10):     out = pd_ema(X, span=time_period, min_periods=time_period)     out[:time_period-1] = np.cumsum(X[:time_period-1]) / np.asarray(range(1,time_period))     return out

回答5:

Here is another solution i came up with in the meantime, it is about 4 times faster than pandas solution.

def numpy_ewma(data, window):     returnArray = np.empty((data.shape[0]))     returnArray.fill(np.nan)     e = data[0]     alpha = 2 / float(window + 1)     for s in range(data.shape[0]):         e =  ((data[s]-e) *alpha ) + e         returnArray[s] = e     return returnArray

I used this formula as a starting point. I am sure that this can be improved even more, but it is at least a starting point

回答6:

This answer may seem irrelevant. But, for those who also need to calculate the exponentially weighted variance (and also standard deviation) with numpy, the following solution will be useful:

import numpy as np  def ew(a, alpha, winSize):     _alpha = 1 - alpha     ws = _alpha ** np.arange(winSize)     w_sum = ws.sum()     ew_mean = np.convolve(a, ws)[winSize - 1]     bias = (w_sum ** 2) / ((w_sum ** 2) - (ws ** 2).sum())     ew_var = (np.convolve((a - ew_mean) ** 2, ws)[winSize - 1] / w_sum) * bias     ew_std = np.sqrt(ew_var)     return (ew_mean, ew_var, ew_std)

文章来源: Numpy version of “Exponential weighted moving average”, equivalent to pandas.ewm().mean()

标签

ema