Moving average or running mean

后端 未结 27 1073
庸人自扰
庸人自扰 2020-11-22 08:37

Is there a SciPy function or NumPy function or module for Python that calculates the running mean of a 1D array given a specific window?

相关标签:
27条回答
  • 2020-11-22 08:52

    Instead of numpy or scipy, I would recommend pandas to do this more swiftly:

    df['data'].rolling(3).mean()
    

    This takes the moving average (MA) of 3 periods of the column "data". You can also calculate the shifted versions, for example the one that excludes the current cell (shifted one back) can be calculated easily as:

    df['data'].shift(periods=1).rolling(3).mean()
    
    0 讨论(0)
  • 2020-11-22 08:54

    UPDATE: more efficient solutions have been proposed, uniform_filter1d from scipy being probably the best among the "standard" 3rd-party libraries, and some newer or specialized libraries are available too.


    You can use np.convolve for that:

    np.convolve(x, np.ones(N)/N, mode='valid')
    

    Explanation

    The running mean is a case of the mathematical operation of convolution. For the running mean, you slide a window along the input and compute the mean of the window's contents. For discrete 1D signals, convolution is the same thing, except instead of the mean you compute an arbitrary linear combination, i.e., multiply each element by a corresponding coefficient and add up the results. Those coefficients, one for each position in the window, are sometimes called the convolution kernel. The arithmetic mean of N values is (x_1 + x_2 + ... + x_N) / N, so the corresponding kernel is (1/N, 1/N, ..., 1/N), and that's exactly what we get by using np.ones(N)/N.

    Edges

    The mode argument of np.convolve specifies how to handle the edges. I chose the valid mode here because I think that's how most people expect the running mean to work, but you may have other priorities. Here is a plot that illustrates the difference between the modes:

    import numpy as np
    import matplotlib.pyplot as plt
    modes = ['full', 'same', 'valid']
    for m in modes:
        plt.plot(np.convolve(np.ones(200), np.ones(50)/50, mode=m));
    plt.axis([-10, 251, -.1, 1.1]);
    plt.legend(modes, loc='lower center');
    plt.show()
    

    Running mean convolve modes

    0 讨论(0)
  • 2020-11-22 08:54

    I feel this can be elegantly solved using bottleneck

    See basic sample below:

    import numpy as np
    import bottleneck as bn
    
    a = np.random.randint(4, 1000, size=100)
    mm = bn.move_mean(a, window=5, min_count=1)
    
    • "mm" is the moving mean for "a".

    • "window" is the max number of entries to consider for moving mean.

    • "min_count" is min number of entries to consider for moving mean (e.g. for first few elements or if the array has nan values).

    The good part is Bottleneck helps to deal with nan values and it's also very efficient.

    0 讨论(0)
  • 2020-11-22 08:54

    There is a comment by mab buried in one of the answers above which has this method. bottleneck has move_mean which is a simple moving average:

    import numpy as np
    import bottleneck as bn
    
    a = np.arange(10) + np.random.random(10)
    
    mva = bn.move_mean(a, window=2, min_count=1)
    

    min_count is a handy parameter that will basically take the moving average up to that point in your array. If you don't set min_count, it will equal window, and everything up to window points will be nan.

    0 讨论(0)
  • 2020-11-22 08:56

    Update: The example below shows the old pandas.rolling_mean function which has been removed in recent versions of pandas. A modern equivalent of the function call below would be

    In [8]: pd.Series(x).rolling(window=N).mean().iloc[N-1:].values
    Out[8]: 
    array([ 0.49815397,  0.49844183,  0.49840518, ...,  0.49488191,
            0.49456679,  0.49427121])
    

    pandas is more suitable for this than NumPy or SciPy. Its function rolling_mean does the job conveniently. It also returns a NumPy array when the input is an array.

    It is difficult to beat rolling_mean in performance with any custom pure Python implementation. Here is an example performance against two of the proposed solutions:

    In [1]: import numpy as np
    
    In [2]: import pandas as pd
    
    In [3]: def running_mean(x, N):
       ...:     cumsum = np.cumsum(np.insert(x, 0, 0)) 
       ...:     return (cumsum[N:] - cumsum[:-N]) / N
       ...:
    
    In [4]: x = np.random.random(100000)
    
    In [5]: N = 1000
    
    In [6]: %timeit np.convolve(x, np.ones((N,))/N, mode='valid')
    10 loops, best of 3: 172 ms per loop
    
    In [7]: %timeit running_mean(x, N)
    100 loops, best of 3: 6.72 ms per loop
    
    In [8]: %timeit pd.rolling_mean(x, N)[N-1:]
    100 loops, best of 3: 4.74 ms per loop
    
    In [9]: np.allclose(pd.rolling_mean(x, N)[N-1:], running_mean(x, N))
    Out[9]: True
    

    There are also nice options as to how to deal with the edge values.

    0 讨论(0)
  • 2020-11-22 08:56

    You can use scipy.ndimage.filters.uniform_filter1d:

    import numpy as np
    from scipy.ndimage.filters import uniform_filter1d
    N = 1000
    x = np.random.random(100000)
    y = uniform_filter1d(x, size=N)
    

    uniform_filter1d:

    • gives the output with the same numpy shape (i.e. number of points)
    • allows multiple ways to handle the border where 'reflect' is the default, but in my case, I rather wanted 'nearest'

    It is also rather quick (nearly 50 times faster than np.convolve and 2-5 times faster than the cumsum approach given above):

    %timeit y1 = np.convolve(x, np.ones((N,))/N, mode='same')
    100 loops, best of 3: 9.28 ms per loop
    
    %timeit y2 = uniform_filter1d(x, size=N)
    10000 loops, best of 3: 191 µs per loop
    

    here's 3 functions that let you compare error/speed of different implementations:

    from __future__ import division
    import numpy as np
    import scipy.ndimage.filters as ndif
    def running_mean_convolve(x, N):
        return np.convolve(x, np.ones(N) / float(N), 'valid')
    def running_mean_cumsum(x, N):
        cumsum = np.cumsum(np.insert(x, 0, 0))
        return (cumsum[N:] - cumsum[:-N]) / float(N)
    def running_mean_uniform_filter1d(x, N):
        return ndif.uniform_filter1d(x, N, mode='constant', origin=-(N//2))[:-(N-1)]
    
    0 讨论(0)
提交回复
热议问题