Moving average or running mean

后端 未结 27 1001
庸人自扰
庸人自扰 2020-11-22 08:37

Is there a SciPy function or NumPy function or module for Python that calculates the running mean of a 1D array given a specific window?

相关标签:
27条回答
  • 2020-11-22 08:56

    For educational purposes, let me add two more Numpy solutions (which are slower than the cumsum solution):

    import numpy as np
    from numpy.lib.stride_tricks import as_strided
    
    def ra_strides(arr, window):
        ''' Running average using as_strided'''
        n = arr.shape[0] - window + 1
        arr_strided = as_strided(arr, shape=[n, window], strides=2*arr.strides)
        return arr_strided.mean(axis=1)
    
    def ra_add(arr, window):
        ''' Running average using add.reduceat'''
        n = arr.shape[0] - window + 1
        indices = np.array([0, window]*n) + np.repeat(np.arange(n), 2)
        arr = np.append(arr, 0)
        return np.add.reduceat(arr, indices )[::2]/window
    

    Functions used: as_strided, add.reduceat

    0 讨论(0)
  • 2020-11-22 08:57

    From reading the other answers I don't think this is what the question asked for, but I got here with the need of keeping a running average of a list of values that was growing in size.

    So if you want to keep a list of values that you are acquiring from somewhere (a site, a measuring device, etc.) and the average of the last n values updated, you can use the code bellow, that minimizes the effort of adding new elements:

    class Running_Average(object):
        def __init__(self, buffer_size=10):
            """
            Create a new Running_Average object.
    
            This object allows the efficient calculation of the average of the last
            `buffer_size` numbers added to it.
    
            Examples
            --------
            >>> a = Running_Average(2)
            >>> a.add(1)
            >>> a.get()
            1.0
            >>> a.add(1)  # there are two 1 in buffer
            >>> a.get()
            1.0
            >>> a.add(2)  # there's a 1 and a 2 in the buffer
            >>> a.get()
            1.5
            >>> a.add(2)
            >>> a.get()  # now there's only two 2 in the buffer
            2.0
            """
            self._buffer_size = int(buffer_size)  # make sure it's an int
            self.reset()
    
        def add(self, new):
            """
            Add a new number to the buffer, or replaces the oldest one there.
            """
            new = float(new)  # make sure it's a float
            n = len(self._buffer)
            if n < self.buffer_size:  # still have to had numbers to the buffer.
                self._buffer.append(new)
                if self._average != self._average:  # ~ if isNaN().
                    self._average = new  # no previous numbers, so it's new.
                else:
                    self._average *= n  # so it's only the sum of numbers.
                    self._average += new  # add new number.
                    self._average /= (n+1)  # divide by new number of numbers.
            else:  # buffer full, replace oldest value.
                old = self._buffer[self._index]  # the previous oldest number.
                self._buffer[self._index] = new  # replace with new one.
                self._index += 1  # update the index and make sure it's...
                self._index %= self.buffer_size  # ... smaller than buffer_size.
                self._average -= old/self.buffer_size  # remove old one...
                self._average += new/self.buffer_size  # ...and add new one...
                # ... weighted by the number of elements.
    
        def __call__(self):
            """
            Return the moving average value, for the lazy ones who don't want
            to write .get .
            """
            return self._average
    
        def get(self):
            """
            Return the moving average value.
            """
            return self()
    
        def reset(self):
            """
            Reset the moving average.
    
            If for some reason you don't want to just create a new one.
            """
            self._buffer = []  # could use np.empty(self.buffer_size)...
            self._index = 0  # and use this to keep track of how many numbers.
            self._average = float('nan')  # could use np.NaN .
    
        def get_buffer_size(self):
            """
            Return current buffer_size.
            """
            return self._buffer_size
    
        def set_buffer_size(self, buffer_size):
            """
            >>> a = Running_Average(10)
            >>> for i in range(15):
            ...     a.add(i)
            ...
            >>> a()
            9.5
            >>> a._buffer  # should not access this!!
            [10.0, 11.0, 12.0, 13.0, 14.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    
            Decreasing buffer size:
            >>> a.buffer_size = 6
            >>> a._buffer  # should not access this!!
            [9.0, 10.0, 11.0, 12.0, 13.0, 14.0]
            >>> a.buffer_size = 2
            >>> a._buffer
            [13.0, 14.0]
    
            Increasing buffer size:
            >>> a.buffer_size = 5
            Warning: no older data available!
            >>> a._buffer
            [13.0, 14.0]
    
            Keeping buffer size:
            >>> a = Running_Average(10)
            >>> for i in range(15):
            ...     a.add(i)
            ...
            >>> a()
            9.5
            >>> a._buffer  # should not access this!!
            [10.0, 11.0, 12.0, 13.0, 14.0, 5.0, 6.0, 7.0, 8.0, 9.0]
            >>> a.buffer_size = 10  # reorders buffer!
            >>> a._buffer
            [5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0]
            """
            buffer_size = int(buffer_size)
            # order the buffer so index is zero again:
            new_buffer = self._buffer[self._index:]
            new_buffer.extend(self._buffer[:self._index])
            self._index = 0
            if self._buffer_size < buffer_size:
                print('Warning: no older data available!')  # should use Warnings!
            else:
                diff = self._buffer_size - buffer_size
                print(diff)
                new_buffer = new_buffer[diff:]
            self._buffer_size = buffer_size
            self._buffer = new_buffer
    
        buffer_size = property(get_buffer_size, set_buffer_size)
    

    And you can test it with, for example:

    def graph_test(N=200):
        import matplotlib.pyplot as plt
        values = list(range(N))
        values_average_calculator = Running_Average(N/2)
        values_averages = []
        for value in values:
            values_average_calculator.add(value)
            values_averages.append(values_average_calculator())
        fig, ax = plt.subplots(1, 1)
        ax.plot(values, label='values')
        ax.plot(values_averages, label='averages')
        ax.grid()
        ax.set_xlim(0, N)
        ax.set_ylim(0, N)
        fig.show()
    

    Which gives:

    0 讨论(0)
  • 2020-11-22 09:01

    Use Only Python Standard Library (Memory Efficient)

    Just give another version of using the standard library deque only. It's quite a surprise to me that most of the answers are using pandas or numpy.

    def moving_average(iterable, n=3):
        d = deque(maxlen=n)
        for i in iterable:
            d.append(i)
            if len(d) == n:
                yield sum(d)/n
    
    r = moving_average([40, 30, 50, 46, 39, 44])
    assert list(r) == [40.0, 42.0, 45.0, 43.0]
    

    Actually I found another implementation in python docs

    def moving_average(iterable, n=3):
        # moving_average([40, 30, 50, 46, 39, 44]) --> 40.0 42.0 45.0 43.0
        # http://en.wikipedia.org/wiki/Moving_average
        it = iter(iterable)
        d = deque(itertools.islice(it, n-1))
        d.appendleft(0)
        s = sum(d)
        for elem in it:
            s += elem - d.popleft()
            d.append(elem)
            yield s / n
    

    However the implementation seems to me is a bit more complex than it should be. But it must be in the standard python docs for a reason, could someone comment on the implementation of mine and the standard doc?

    0 讨论(0)
  • 2020-11-22 09:02

    or module for python that calculates

    in my tests at Tradewave.net TA-lib always wins:

    import talib as ta
    import numpy as np
    import pandas as pd
    import scipy
    from scipy import signal
    import time as t
    
    PAIR = info.primary_pair
    PERIOD = 30
    
    def initialize():
        storage.reset()
        storage.elapsed = storage.get('elapsed', [0,0,0,0,0,0])
    
    def cumsum_sma(array, period):
        ret = np.cumsum(array, dtype=float)
        ret[period:] = ret[period:] - ret[:-period]
        return ret[period - 1:] / period
    
    def pandas_sma(array, period):
        return pd.rolling_mean(array, period)
    
    def api_sma(array, period):
        # this method is native to Tradewave and does NOT return an array
        return (data[PAIR].ma(PERIOD))
    
    def talib_sma(array, period):
        return ta.MA(array, period)
    
    def convolve_sma(array, period):
        return np.convolve(array, np.ones((period,))/period, mode='valid')
    
    def fftconvolve_sma(array, period):    
        return scipy.signal.fftconvolve(
            array, np.ones((period,))/period, mode='valid')    
    
    def tick():
    
        close = data[PAIR].warmup_period('close')
    
        t1 = t.time()
        sma_api = api_sma(close, PERIOD)
        t2 = t.time()
        sma_cumsum = cumsum_sma(close, PERIOD)
        t3 = t.time()
        sma_pandas = pandas_sma(close, PERIOD)
        t4 = t.time()
        sma_talib = talib_sma(close, PERIOD)
        t5 = t.time()
        sma_convolve = convolve_sma(close, PERIOD)
        t6 = t.time()
        sma_fftconvolve = fftconvolve_sma(close, PERIOD)
        t7 = t.time()
    
        storage.elapsed[-1] = storage.elapsed[-1] + t2-t1
        storage.elapsed[-2] = storage.elapsed[-2] + t3-t2
        storage.elapsed[-3] = storage.elapsed[-3] + t4-t3
        storage.elapsed[-4] = storage.elapsed[-4] + t5-t4
        storage.elapsed[-5] = storage.elapsed[-5] + t6-t5    
        storage.elapsed[-6] = storage.elapsed[-6] + t7-t6        
    
        plot('sma_api', sma_api)  
        plot('sma_cumsum', sma_cumsum[-5])
        plot('sma_pandas', sma_pandas[-10])
        plot('sma_talib', sma_talib[-15])
        plot('sma_convolve', sma_convolve[-20])    
        plot('sma_fftconvolve', sma_fftconvolve[-25])
    
    def stop():
    
        log('ticks....: %s' % info.max_ticks)
    
        log('api......: %.5f' % storage.elapsed[-1])
        log('cumsum...: %.5f' % storage.elapsed[-2])
        log('pandas...: %.5f' % storage.elapsed[-3])
        log('talib....: %.5f' % storage.elapsed[-4])
        log('convolve.: %.5f' % storage.elapsed[-5])    
        log('fft......: %.5f' % storage.elapsed[-6])
    

    results:

    [2015-01-31 23:00:00] ticks....: 744
    [2015-01-31 23:00:00] api......: 0.16445
    [2015-01-31 23:00:00] cumsum...: 0.03189
    [2015-01-31 23:00:00] pandas...: 0.03677
    [2015-01-31 23:00:00] talib....: 0.00700  # <<< Winner!
    [2015-01-31 23:00:00] convolve.: 0.04871
    [2015-01-31 23:00:00] fft......: 0.22306
    

    0 讨论(0)
  • 2020-11-22 09:02

    I haven't yet checked how fast this is, but you could try:

    from collections import deque
    
    cache = deque() # keep track of seen values
    n = 10          # window size
    A = xrange(100) # some dummy iterable
    cum_sum = 0     # initialize cumulative sum
    
    for t, val in enumerate(A, 1):
        cache.append(val)
        cum_sum += val
        if t < n:
            avg = cum_sum / float(t)
        else:                           # if window is saturated,
            cum_sum -= cache.popleft()  # subtract oldest value
            avg = cum_sum / float(n)
    
    0 讨论(0)
  • 2020-11-22 09:02

    Although there are solutions for this question here, please take a look at my solution. It is very simple and working well.

    import numpy as np
    dataset = np.asarray([1, 2, 3, 4, 5, 6, 7])
    ma = list()
    window = 3
    for t in range(0, len(dataset)):
        if t+window <= len(dataset):
            indices = range(t, t+window)
            ma.append(np.average(np.take(dataset, indices)))
    else:
        ma = np.asarray(ma)
    
    0 讨论(0)
提交回复
热议问题