How do I get the exponential weighted moving average in NumPy just like the following in pandas?
import pandas as pd
import pandas_datareader as pdr
from dat
@Divakar's answer seems to cause overflow when dealing with
numpy_ewma_vectorized(np.random.random(500000), 10)
What I have been using is:
def EMA(input, time_period=10): # For time period = 10
t_ = time_period - 1
ema = np.zeros_like(input,dtype=float)
multiplier = 2.0 / (time_period + 1)
#multiplier = 1 - multiplier
for i in range(len(input)):
# Special Case
if i > t_:
ema[i] = (input[i] - ema[i-1]) * multiplier + ema[i-1]
else:
ema[i] = np.mean(input[:i+1])
return ema
However, this is way slower than the panda solution:
from pandas import ewma as pd_ema
def EMA_fast(X, time_period = 10):
out = pd_ema(X, span=time_period, min_periods=time_period)
out[:time_period-1] = np.cumsum(X[:time_period-1]) / np.asarray(range(1,time_period))
return out
A very simple solution that avoids numba but that is almost as fast as Alexander McFarlane's solution, especially for large arrays and large window
sizes, is to use scipy's lfilter
function (because an EWMA is a linear filter):
from scipy.signal import lfiltic, lfilter
# careful not to mix between scipy.signal and standard python signal
# (https://docs.python.org/3/library/signal.html) if your code handles some processes
def ewma_linear_filter(array, window):
alpha = 2 /(window + 1)
b = [alpha]
a = [1, alpha-1]
zi = lfiltic(b, a, array[0:1], [0])
return lfilter(b, a, array, zi=zi)[0]
Timings are as follows:
n = 10_000_000
window = 100_000
data = np.random.normal(0, 1, n)
%timeit _ewma_infinite_hist(data, window)
%timeit linear_filter(data, window)
86 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
92.6 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Here is an implementation using NumPy that is equivalent to using df.ewm(alpha=alpha).mean()
. After reading the documentation, it is just a few matrix operations. The trick is constructing the right matrices.
It is worth noting that because we are creating float matrices, you can quickly eat through your memory if the input array is too large.
import pandas as pd
import numpy as np
def ewma(x, alpha):
'''
Returns the exponentially weighted moving average of x.
Parameters:
-----------
x : array-like
alpha : float {0 <= alpha <= 1}
Returns:
--------
ewma: numpy array
the exponentially weighted moving average
'''
# Coerce x to an array
x = np.array(x)
n = x.size
# Create an initial weight matrix of (1-alpha), and a matrix of powers
# to raise the weights by
w0 = np.ones(shape=(n,n)) * (1-alpha)
p = np.vstack([np.arange(i,i-n,-1) for i in range(n)])
# Create the weight matrix
w = np.tril(w0**p,0)
# Calculate the ewma
return np.dot(w, x[::np.newaxis]) / w.sum(axis=1)
Let's test its:
alpha = 0.55
x = np.random.randint(0,30,15)
df = pd.DataFrame(x, columns=['A'])
df.ewm(alpha=alpha).mean()
# returns:
# A
# 0 13.000000
# 1 22.655172
# 2 20.443268
# 3 12.159796
# 4 14.871955
# 5 15.497575
# 6 20.743511
# 7 20.884818
# 8 24.250715
# 9 18.610901
# 10 17.174686
# 11 16.528564
# 12 17.337879
# 13 7.801912
# 14 12.310889
ewma(x=x, alpha=alpha)
# returns:
# array([ 13. , 22.65517241, 20.44326778, 12.1597964 ,
# 14.87195534, 15.4975749 , 20.74351117, 20.88481763,
# 24.25071484, 18.61090129, 17.17468551, 16.52856393,
# 17.33787888, 7.80191235, 12.31088889])
Given alpha
and windowSize
, here's an approach to simulate the corresponding behavior on NumPy -
def numpy_ewm_alpha(a, alpha, windowSize):
wghts = (1-alpha)**np.arange(windowSize)
wghts /= wghts.sum()
out = np.full(df.shape[0],np.nan)
out[windowSize-1:] = np.convolve(a,wghts,'valid')
return out
Sample runs for verification -
In [54]: alpha = 0.55
...: windowSize = 20
...:
In [55]: df = pd.DataFrame(np.random.randint(2,9,(100)))
In [56]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()
...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))
...:
Max. error : 5.10531254605e-07
In [57]: alpha = 0.75
...: windowSize = 30
...:
In [58]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()
...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))
Max. error : 8.881784197e-16
Runtime test on bigger dataset -
In [61]: alpha = 0.55
...: windowSize = 20
...:
In [62]: df = pd.DataFrame(np.random.randint(2,9,(10000)))
In [63]: %timeit df.ewm(alpha = alpha, min_periods=windowSize).mean()
1000 loops, best of 3: 851 µs per loop
In [64]: %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
1000 loops, best of 3: 204 µs per loop
Further boost
For further performance boost we could avoid the initialization with NaNs and instead use the array outputted from np.convolve
, like so -
def numpy_ewm_alpha_v2(a, alpha, windowSize):
wghts = (1-alpha)**np.arange(windowSize)
wghts /= wghts.sum()
out = np.convolve(a,wghts)
out[:windowSize-1] = np.nan
return out[:a.size]
Timings -
In [117]: alpha = 0.55
...: windowSize = 20
...:
In [118]: df = pd.DataFrame(np.random.randint(2,9,(10000)))
In [119]: %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
1000 loops, best of 3: 204 µs per loop
In [120]: %timeit numpy_ewm_alpha_v2(df.values.ravel(), alpha = alpha, windowSize = windowSize)
10000 loops, best of 3: 195 µs per loop
This answer may seem irrelevant. But, for those who also need to calculate the exponentially weighted variance (and also standard deviation) with NumPy, the following solution will be useful:
import numpy as np
def ew(a, alpha, winSize):
_alpha = 1 - alpha
ws = _alpha ** np.arange(winSize)
w_sum = ws.sum()
ew_mean = np.convolve(a, ws)[winSize - 1] / w_sum
bias = (w_sum ** 2) / ((w_sum ** 2) - (ws ** 2).sum())
ew_var = (np.convolve((a - ew_mean) ** 2, ws)[winSize - 1] / w_sum) * bias
ew_std = np.sqrt(ew_var)
return (ew_mean, ew_var, ew_std)
Here is another solution O came up with in the meantime. It is about four times faster than the pandas solution.
def numpy_ewma(data, window):
returnArray = np.empty((data.shape[0]))
returnArray.fill(np.nan)
e = data[0]
alpha = 2 / float(window + 1)
for s in range(data.shape[0]):
e = ((data[s]-e) *alpha ) + e
returnArray[s] = e
return returnArray
I used this formula as a starting point. I am sure that this can be improved even more, but it is at least a starting point.