NumPy version of “Exponential weighted moving average”, equivalent to pandas.ewm().mean()

后端 未结 12 710
一生所求
一生所求 2020-11-27 12:30

How do I get the exponential weighted moving average in NumPy just like the following in pandas?

import pandas as pd
import pandas_datareader as pdr
from dat         


        
相关标签:
12条回答
  • 2020-11-27 12:49

    @Divakar's answer seems to cause overflow when dealing with

    numpy_ewma_vectorized(np.random.random(500000), 10)
    

    What I have been using is:

    def EMA(input, time_period=10): # For time period = 10
        t_ = time_period - 1
        ema = np.zeros_like(input,dtype=float)
        multiplier = 2.0 / (time_period + 1)
        #multiplier = 1 - multiplier
        for i in range(len(input)):
            # Special Case
            if i > t_:
                ema[i] = (input[i] - ema[i-1]) * multiplier + ema[i-1]
            else:
                ema[i] = np.mean(input[:i+1])
        return ema
    

    However, this is way slower than the panda solution:

    from pandas import ewma as pd_ema
    def EMA_fast(X, time_period = 10):
        out = pd_ema(X, span=time_period, min_periods=time_period)
        out[:time_period-1] = np.cumsum(X[:time_period-1]) / np.asarray(range(1,time_period))
        return out
    
    0 讨论(0)
  • 2020-11-27 12:52

    A very simple solution that avoids numba but that is almost as fast as Alexander McFarlane's solution, especially for large arrays and large window sizes, is to use scipy's lfilter function (because an EWMA is a linear filter):

    from scipy.signal import lfiltic, lfilter
    # careful not to mix between scipy.signal and standard python signal 
    # (https://docs.python.org/3/library/signal.html) if your code handles some processes
    
    def ewma_linear_filter(array, window):
        alpha = 2 /(window + 1)
        b = [alpha]
        a = [1, alpha-1]
        zi = lfiltic(b, a, array[0:1], [0])
        return lfilter(b, a, array, zi=zi)[0]
    

    Timings are as follows:

    n = 10_000_000
    window = 100_000
    data = np.random.normal(0, 1, n)
    
    %timeit _ewma_infinite_hist(data, window)
    %timeit linear_filter(data, window)
    
    86 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    92.6 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    0 讨论(0)
  • 2020-11-27 12:53

    Here is an implementation using NumPy that is equivalent to using df.ewm(alpha=alpha).mean(). After reading the documentation, it is just a few matrix operations. The trick is constructing the right matrices.

    It is worth noting that because we are creating float matrices, you can quickly eat through your memory if the input array is too large.

    import pandas as pd
    import numpy as np
    
    def ewma(x, alpha):
        '''
        Returns the exponentially weighted moving average of x.
    
        Parameters:
        -----------
        x : array-like
        alpha : float {0 <= alpha <= 1}
    
        Returns:
        --------
        ewma: numpy array
              the exponentially weighted moving average
        '''
        # Coerce x to an array
        x = np.array(x)
        n = x.size
    
        # Create an initial weight matrix of (1-alpha), and a matrix of powers
        # to raise the weights by
        w0 = np.ones(shape=(n,n)) * (1-alpha)
        p = np.vstack([np.arange(i,i-n,-1) for i in range(n)])
    
        # Create the weight matrix
        w = np.tril(w0**p,0)
    
        # Calculate the ewma
        return np.dot(w, x[::np.newaxis]) / w.sum(axis=1)
    

    Let's test its:

    alpha = 0.55
    x = np.random.randint(0,30,15)
    df = pd.DataFrame(x, columns=['A'])
    df.ewm(alpha=alpha).mean()
    
    # returns:
    #             A
    # 0   13.000000
    # 1   22.655172
    # 2   20.443268
    # 3   12.159796
    # 4   14.871955
    # 5   15.497575
    # 6   20.743511
    # 7   20.884818
    # 8   24.250715
    # 9   18.610901
    # 10  17.174686
    # 11  16.528564
    # 12  17.337879
    # 13   7.801912
    # 14  12.310889
    
    ewma(x=x, alpha=alpha)
    
    # returns:
    # array([ 13.        ,  22.65517241,  20.44326778,  12.1597964 ,
    #        14.87195534,  15.4975749 ,  20.74351117,  20.88481763,
    #        24.25071484,  18.61090129,  17.17468551,  16.52856393,
    #        17.33787888,   7.80191235,  12.31088889])
    
    0 讨论(0)
  • 2020-11-27 12:56

    Given alpha and windowSize, here's an approach to simulate the corresponding behavior on NumPy -

    def numpy_ewm_alpha(a, alpha, windowSize):
        wghts = (1-alpha)**np.arange(windowSize)
        wghts /= wghts.sum()
        out = np.full(df.shape[0],np.nan)
        out[windowSize-1:] = np.convolve(a,wghts,'valid')
        return out
    

    Sample runs for verification -

    In [54]: alpha = 0.55
        ...: windowSize = 20
        ...: 
    
    In [55]: df = pd.DataFrame(np.random.randint(2,9,(100)))
    
    In [56]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()
        ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
        ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))
        ...: 
    Max. error : 5.10531254605e-07
    
    In [57]: alpha = 0.75
        ...: windowSize = 30
        ...: 
    
    In [58]: out0 = df.ewm(alpha = alpha, min_periods=windowSize).mean().as_matrix().ravel()
        ...: out1 = numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
        ...: print "Max. error : " + str(np.nanmax(np.abs(out0 - out1)))
    
    Max. error : 8.881784197e-16
    

    Runtime test on bigger dataset -

    In [61]: alpha = 0.55
        ...: windowSize = 20
        ...: 
    
    In [62]: df = pd.DataFrame(np.random.randint(2,9,(10000)))
    
    In [63]: %timeit df.ewm(alpha = alpha, min_periods=windowSize).mean()
    1000 loops, best of 3: 851 µs per loop
    
    In [64]: %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
    1000 loops, best of 3: 204 µs per loop
    

    Further boost

    For further performance boost we could avoid the initialization with NaNs and instead use the array outputted from np.convolve, like so -

    def numpy_ewm_alpha_v2(a, alpha, windowSize):
        wghts = (1-alpha)**np.arange(windowSize)
        wghts /= wghts.sum()
        out = np.convolve(a,wghts)
        out[:windowSize-1] = np.nan
        return out[:a.size]  
    

    Timings -

    In [117]: alpha = 0.55
         ...: windowSize = 20
         ...: 
    
    In [118]: df = pd.DataFrame(np.random.randint(2,9,(10000)))
    
    In [119]: %timeit numpy_ewm_alpha(df.values.ravel(), alpha = alpha, windowSize = windowSize)
    1000 loops, best of 3: 204 µs per loop
    
    In [120]: %timeit numpy_ewm_alpha_v2(df.values.ravel(), alpha = alpha, windowSize = windowSize)
    10000 loops, best of 3: 195 µs per loop
    
    0 讨论(0)
  • 2020-11-27 12:58

    This answer may seem irrelevant. But, for those who also need to calculate the exponentially weighted variance (and also standard deviation) with NumPy, the following solution will be useful:

    import numpy as np
    
    def ew(a, alpha, winSize):
        _alpha = 1 - alpha
        ws = _alpha ** np.arange(winSize)
        w_sum = ws.sum()
        ew_mean = np.convolve(a, ws)[winSize - 1] / w_sum
        bias = (w_sum ** 2) / ((w_sum ** 2) - (ws ** 2).sum())
        ew_var = (np.convolve((a - ew_mean) ** 2, ws)[winSize - 1] / w_sum) * bias
        ew_std = np.sqrt(ew_var)
        return (ew_mean, ew_var, ew_std)
    
    0 讨论(0)
  • 2020-11-27 12:59

    Here is another solution O came up with in the meantime. It is about four times faster than the pandas solution.

    def numpy_ewma(data, window):
        returnArray = np.empty((data.shape[0]))
        returnArray.fill(np.nan)
        e = data[0]
        alpha = 2 / float(window + 1)
        for s in range(data.shape[0]):
            e =  ((data[s]-e) *alpha ) + e
            returnArray[s] = e
        return returnArray
    

    I used this formula as a starting point. I am sure that this can be improved even more, but it is at least a starting point.

    0 讨论(0)
提交回复
热议问题