Moving average or running mean

后端 未结 27 1042
庸人自扰
庸人自扰 2020-11-22 08:37

Is there a SciPy function or NumPy function or module for Python that calculates the running mean of a 1D array given a specific window?

相关标签:
27条回答
  • 2020-11-22 09:15

    Efficient solution

    Convolution is much better than straightforward approach, but (I guess) it uses FFT and thus quite slow. However specially for computing the running mean the following approach works fine

    def running_mean(x, N):
        cumsum = numpy.cumsum(numpy.insert(x, 0, 0)) 
        return (cumsum[N:] - cumsum[:-N]) / float(N)
    

    The code to check

    In[3]: x = numpy.random.random(100000)
    In[4]: N = 1000
    In[5]: %timeit result1 = numpy.convolve(x, numpy.ones((N,))/N, mode='valid')
    10 loops, best of 3: 41.4 ms per loop
    In[6]: %timeit result2 = running_mean(x, N)
    1000 loops, best of 3: 1.04 ms per loop
    

    Note that numpy.allclose(result1, result2) is True, two methods are equivalent. The greater N, the greater difference in time.

    warning: although cumsum is faster there will be increased floating point error that may cause your results to be invalid/incorrect/unacceptable

    the comments pointed out this floating point error issue here but i am making it more obvious here in the answer..

    # demonstrate loss of precision with only 100,000 points
    np.random.seed(42)
    x = np.random.randn(100000)+1e6
    y1 = running_mean_convolve(x, 10)
    y2 = running_mean_cumsum(x, 10)
    assert np.allclose(y1, y2, rtol=1e-12, atol=0)
    
    • the more points you accumulate over the greater the floating point error (so 1e5 points is noticable, 1e6 points is more significant, more than 1e6 and you may want to resetting the accumulators)
    • you can cheat by using np.longdouble but your floating point error still will get significant for relatively large number of points (around >1e5 but depends on your data)
    • you can plot the error and see it increasing relatively fast
    • the convolve solution is slower but does not have this floating point loss of precision
    • the uniform_filter1d solution is faster than this cumsum solution AND does not have this floating point loss of precision
    0 讨论(0)
  • 2020-11-22 09:15

    I know this is an old question, but here is a solution that doesn't use any extra data structures or libraries. It is linear in the number of elements of the input list and I cannot think of any other way to make it more efficient (actually if anyone knows of a better way to allocate the result, please let me know).

    NOTE: this would be much faster using a numpy array instead of a list, but I wanted to eliminate all dependencies. It would also be possible to improve performance by multi-threaded execution

    The function assumes that the input list is one dimensional, so be careful.

    ### Running mean/Moving average
    def running_mean(l, N):
        sum = 0
        result = list( 0 for x in l)
    
        for i in range( 0, N ):
            sum = sum + l[i]
            result[i] = sum / (i+1)
    
        for i in range( N, len(l) ):
            sum = sum - l[i-N] + l[i]
            result[i] = sum / N
    
        return result
    

    Example

    Assume that we have a list data = [ 1, 2, 3, 4, 5, 6 ] on which we want to compute a rolling mean with period of 3, and that you also want a output list that is the same size of the input one (that's most often the case).

    The first element has index 0, so the rolling mean should be computed on elements of index -2, -1 and 0. Obviously we don't have data[-2] and data[-1] (unless you want to use special boundary conditions), so we assume that those elements are 0. This is equivalent to zero-padding the list, except we don't actually pad it, just keep track of the indices that require padding (from 0 to N-1).

    So, for the first N elements we just keep adding up the elements in an accumulator.

    result[0] = (0 + 0 + 1) / 3  = 0.333    ==   (sum + 1) / 3
    result[1] = (0 + 1 + 2) / 3  = 1        ==   (sum + 2) / 3
    result[2] = (1 + 2 + 3) / 3  = 2        ==   (sum + 3) / 3
    

    From elements N+1 forwards simple accumulation doesn't work. we expect result[3] = (2 + 3 + 4)/3 = 3 but this is different from (sum + 4)/3 = 3.333.

    The way to compute the correct value is to subtract data[0] = 1 from sum+4, thus giving sum + 4 - 1 = 9.

    This happens because currently sum = data[0] + data[1] + data[2], but it is also true for every i >= N because, before the subtraction, sum is data[i-N] + ... + data[i-2] + data[i-1].

    0 讨论(0)
  • 2020-11-22 09:16

    Another approach to find moving average without using numpy, panda

    import itertools
    sample = [2, 6, 10, 8, 11, 10]
    list(itertools.starmap(lambda a,b: b/a, 
                   enumerate(itertools.accumulate(sample), 1)))
    

    will print [2.0, 4.0, 6.0, 6.5, 7.4, 7.833333333333333]

    0 讨论(0)
  • 2020-11-22 09:16

    All the aforementioned solutions are poor because they lack

    • speed due to a native python instead of a numpy vectorized implementation,
    • numerical stability due to poor use of numpy.cumsum, or
    • speed due to O(len(x) * w) implementations as convolutions.

    Given

    import numpy
    m = 10000
    x = numpy.random.rand(m)
    w = 1000
    

    Note that x_[:w].sum() equals x[:w-1].sum(). So for the first average the numpy.cumsum(...) adds x[w] / w (via x_[w+1] / w), and subtracts 0 (from x_[0] / w). This results in x[0:w].mean()

    Via cumsum, you'll update the second average by additionally add x[w+1] / w and subtracting x[0] / w, resulting in x[1:w+1].mean().

    This goes on until x[-w:].mean() is reached.

    x_ = numpy.insert(x, 0, 0)
    sliding_average = x_[:w].sum() / w + numpy.cumsum(x_[w:] - x_[:-w]) / w
    

    This solution is vectorized, O(m), readable and numerically stable.

    0 讨论(0)
  • 2020-11-22 09:16

    Another solution just using a standard library and deque:

    from collections import deque
    import itertools
    
    def moving_average(iterable, n=3):
        # http://en.wikipedia.org/wiki/Moving_average
        it = iter(iterable) 
        # create an iterable object from input argument
        d = deque(itertools.islice(it, n-1))  
        # create deque object by slicing iterable
        d.appendleft(0)
        s = sum(d)
        for elem in it:
            s += elem - d.popleft()
            d.append(elem)
            yield s / n
    
    # example on how to use it
    for i in  moving_average([40, 30, 50, 46, 39, 44]):
        print(i)
    
    # 40.0
    # 42.0
    # 45.0
    # 43.0
    
    0 讨论(0)
  • 2020-11-22 09:16

    How about a moving average filter? It is also a one-liner and has the advantage, that you can easily manipulate the window type if you need something else than the rectangle, ie. a N-long simple moving average of an array a:

    lfilter(np.ones(N)/N, [1], a)[N:]
    

    And with the triangular window applied:

    lfilter(np.ones(N)*scipy.signal.triang(N)/N, [1], a)[N:]
    

    Note: I usually discard the first N samples as bogus hence [N:] at the end, but it is not necessary and the matter of a personal choice only.

    0 讨论(0)
提交回复
热议问题