Moving average or running mean

后端 未结 27 999
庸人自扰
庸人自扰 2020-11-22 08:37

Is there a SciPy function or NumPy function or module for Python that calculates the running mean of a 1D array given a specific window?

相关标签:
27条回答
  • 2020-11-22 09:04

    You can calculate a running mean with:

    import numpy as np
    
    def runningMean(x, N):
        y = np.zeros((len(x),))
        for ctr in range(len(x)):
             y[ctr] = np.sum(x[ctr:(ctr+N)])
        return y/N
    

    But it's slow.

    Fortunately, numpy includes a convolve function which we can use to speed things up. The running mean is equivalent to convolving x with a vector that is N long, with all members equal to 1/N. The numpy implementation of convolve includes the starting transient, so you have to remove the first N-1 points:

    def runningMeanFast(x, N):
        return np.convolve(x, np.ones((N,))/N)[(N-1):]
    

    On my machine, the fast version is 20-30 times faster, depending on the length of the input vector and size of the averaging window.

    Note that convolve does include a 'same' mode which seems like it should address the starting transient issue, but it splits it between the beginning and end.

    0 讨论(0)
  • 2020-11-22 09:08

    Python standard library solution

    This generator-function takes an iterable and a window size N and yields the average over the current values inside the window. It uses a deque, which is a datastructure similar to a list, but optimized for fast modifications (pop, append) at both endpoints.

    from collections import deque
    from itertools import islice
    
    def sliding_avg(iterable, N):        
        it = iter(iterable)
        window = deque(islice(it, N))        
        num_vals = len(window)
    
        if num_vals < N:
            msg = 'window size {} exceeds total number of values {}'
            raise ValueError(msg.format(N, num_vals))
    
        N = float(N) # force floating point division if using Python 2
        s = sum(window)
        
        while True:
            yield s/N
            try:
                nxt = next(it)
            except StopIteration:
                break
            s = s - window.popleft() + nxt
            window.append(nxt)
            
    

    Here is the function in action:

    >>> values = range(100)
    >>> N = 5
    >>> window_avg = sliding_avg(values, N)
    >>> 
    >>> next(window_avg) # (0 + 1 + 2 + 3 + 4)/5
    >>> 2.0
    >>> next(window_avg) # (1 + 2 + 3 + 4 + 5)/5
    >>> 3.0
    >>> next(window_avg) # (2 + 3 + 4 + 5 + 6)/5
    >>> 4.0
    
    0 讨论(0)
  • 2020-11-22 09:09

    There are many answers above about calculating a running mean. My answer adds two extra features:

    1. ignores nan values
    2. calculates the mean for the N neighboring values NOT including the value of interest itself

    This second feature is particularly useful for determining which values differ from the general trend by a certain amount.

    I use numpy.cumsum since it is the most time-efficient method (see Alleo's answer above).

    N=10 # number of points to test on each side of point of interest, best if even
    padded_x = np.insert(np.insert( np.insert(x, len(x), np.empty(int(N/2))*np.nan), 0, np.empty(int(N/2))*np.nan ),0,0)
    n_nan = np.cumsum(np.isnan(padded_x))
    cumsum = np.nancumsum(padded_x) 
    window_sum = cumsum[N+1:] - cumsum[:-(N+1)] - x # subtract value of interest from sum of all values within window
    window_n_nan = n_nan[N+1:] - n_nan[:-(N+1)] - np.isnan(x)
    window_n_values = (N - window_n_nan)
    movavg = (window_sum) / (window_n_values)
    

    This code works for even Ns only. It can be adjusted for odd numbers by changing the np.insert of padded_x and n_nan.

    Example output (raw in black, movavg in blue):

    This code can be easily adapted to remove all moving average values calculated from fewer than cutoff = 3 non-nan values.

    window_n_values = (N - window_n_nan).astype(float) # dtype must be float to set some values to nan
    cutoff = 3
    window_n_values[window_n_values<cutoff] = np.nan
    movavg = (window_sum) / (window_n_values)
    

    0 讨论(0)
  • 2020-11-22 09:09

    With @Aikude's variables, I wrote one-liner.

    import numpy as np
    
    mylist = [1, 2, 3, 4, 5, 6, 7]
    N = 3
    
    mean = [np.mean(mylist[x:x+N]) for x in range(len(mylist)-N+1)]
    print(mean)
    
    >>> [2.0, 3.0, 4.0, 5.0, 6.0]
    
    0 讨论(0)
  • 2020-11-22 09:10

    For a short, fast solution that does the whole thing in one loop, without dependencies, the code below works great.

    mylist = [1, 2, 3, 4, 5, 6, 7]
    N = 3
    cumsum, moving_aves = [0], []
    
    for i, x in enumerate(mylist, 1):
        cumsum.append(cumsum[i-1] + x)
        if i>=N:
            moving_ave = (cumsum[i] - cumsum[i-N])/N
            #can do stuff with moving_ave here
            moving_aves.append(moving_ave)
    
    0 讨论(0)
  • 2020-11-22 09:13

    A bit late to the party, but I've made my own little function that does NOT wrap around the ends or pads with zeroes that are then used to find the average as well. As a further treat is, that it also re-samples the signal at linearly spaced points. Customize the code at will to get other features.

    The method is a simple matrix multiplication with a normalized Gaussian kernel.

    def running_mean(y_in, x_in, N_out=101, sigma=1):
        '''
        Returns running mean as a Bell-curve weighted average at evenly spaced
        points. Does NOT wrap signal around, or pad with zeros.
    
        Arguments:
        y_in -- y values, the values to be smoothed and re-sampled
        x_in -- x values for array
    
        Keyword arguments:
        N_out -- NoOf elements in resampled array.
        sigma -- 'Width' of Bell-curve in units of param x .
        '''
        N_in = size(y_in)
    
        # Gaussian kernel
        x_out = np.linspace(np.min(x_in), np.max(x_in), N_out)
        x_in_mesh, x_out_mesh = np.meshgrid(x_in, x_out)
        gauss_kernel = np.exp(-np.square(x_in_mesh - x_out_mesh) / (2 * sigma**2))
        # Normalize kernel, such that the sum is one along axis 1
        normalization = np.tile(np.reshape(sum(gauss_kernel, axis=1), (N_out, 1)), (1, N_in))
        gauss_kernel_normalized = gauss_kernel / normalization
        # Perform running average as a linear operation
        y_out = gauss_kernel_normalized @ y_in
    
        return y_out, x_out
    

    A simple usage on a sinusoidal signal with added normal distributed noise:

    0 讨论(0)
提交回复
热议问题