Moving average or running mean

后端未结

关注

 27  1075

Is there a SciPy function or NumPy function or module for Python that calculates the running mean of a 1D array given a specific window?

相关标签:

27条回答

遥遥无期

2020-11-22 09:04
You can calculate a running mean with:
```
import numpy as np

def runningMean(x, N):
    y = np.zeros((len(x),))
    for ctr in range(len(x)):
         y[ctr] = np.sum(x[ctr:(ctr+N)])
    return y/N
```
But it's slow.

Fortunately, numpy includes a convolve function which we can use to speed things up. The running mean is equivalent to convolving x with a vector that is N long, with all members equal to 1/N. The numpy implementation of convolve includes the starting transient, so you have to remove the first N-1 points:
```
def runningMeanFast(x, N):
    return np.convolve(x, np.ones((N,))/N)[(N-1):]
```
On my machine, the fast version is 20-30 times faster, depending on the length of the input vector and size of the averaging window.

Note that convolve does include a 'same' mode which seems like it should address the starting transient issue, but it splits it between the beginning and end.
0 讨论(0)
发布评论:

提交评论
- 加载中...

不知归路

2020-11-22 09:08

Python standard library solution

This generator-function takes an iterable and a window size N and yields the average over the current values inside the window. It uses a deque, which is a datastructure similar to a list, but optimized for fast modifications (pop, append) at both endpoints.

from collections import deque
from itertools import islice

def sliding_avg(iterable, N):        
    it = iter(iterable)
    window = deque(islice(it, N))        
    num_vals = len(window)

    if num_vals < N:
        msg = 'window size {} exceeds total number of values {}'
        raise ValueError(msg.format(N, num_vals))

    N = float(N) # force floating point division if using Python 2
    s = sum(window)
    
    while True:
        yield s/N
        try:
            nxt = next(it)
        except StopIteration:
            break
        s = s - window.popleft() + nxt
        window.append(nxt)

Here is the function in action:

>>> values = range(100)
>>> N = 5
>>> window_avg = sliding_avg(values, N)
>>> 
>>> next(window_avg) # (0 + 1 + 2 + 3 + 4)/5
>>> 2.0
>>> next(window_avg) # (1 + 2 + 3 + 4 + 5)/5
>>> 3.0
>>> next(window_avg) # (2 + 3 + 4 + 5 + 6)/5
>>> 4.0

0 讨论(0)

旧时难觅i

2020-11-22 09:09
There are many answers above about calculating a running mean. My answer adds two extra features:
1. ignores nan values
2. calculates the mean for the N neighboring values NOT including the value of interest itself
This second feature is particularly useful for determining which values differ from the general trend by a certain amount.

I use numpy.cumsum since it is the most time-efficient method (see Alleo's answer above).
```
N=10 # number of points to test on each side of point of interest, best if even
padded_x = np.insert(np.insert( np.insert(x, len(x), np.empty(int(N/2))*np.nan), 0, np.empty(int(N/2))*np.nan ),0,0)
n_nan = np.cumsum(np.isnan(padded_x))
cumsum = np.nancumsum(padded_x) 
window_sum = cumsum[N+1:] - cumsum[:-(N+1)] - x # subtract value of interest from sum of all values within window
window_n_nan = n_nan[N+1:] - n_nan[:-(N+1)] - np.isnan(x)
window_n_values = (N - window_n_nan)
movavg = (window_sum) / (window_n_values)
```
This code works for even Ns only. It can be adjusted for odd numbers by changing the np.insert of padded_x and n_nan.

Example output (raw in black, movavg in blue):

This code can be easily adapted to remove all moving average values calculated from fewer than cutoff = 3 non-nan values.
```
window_n_values = (N - window_n_nan).astype(float) # dtype must be float to set some values to nan
cutoff = 3
window_n_values[window_n_values<cutoff] = np.nan
movavg = (window_sum) / (window_n_values)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

失恋的感觉

2020-11-22 09:09

With @Aikude's variables, I wrote one-liner.

import numpy as np

mylist = [1, 2, 3, 4, 5, 6, 7]
N = 3

mean = [np.mean(mylist[x:x+N]) for x in range(len(mylist)-N+1)]
print(mean)

>>> [2.0, 3.0, 4.0, 5.0, 6.0]

0 讨论(0)

囚心锁ツ

2020-11-22 09:10

For a short, fast solution that does the whole thing in one loop, without dependencies, the code below works great.

mylist = [1, 2, 3, 4, 5, 6, 7]
N = 3
cumsum, moving_aves = [0], []

for i, x in enumerate(mylist, 1):
    cumsum.append(cumsum[i-1] + x)
    if i>=N:
        moving_ave = (cumsum[i] - cumsum[i-N])/N
        #can do stuff with moving_ave here
        moving_aves.append(moving_ave)

0 讨论(0)

故里飘歌

2020-11-22 09:13

A bit late to the party, but I've made my own little function that does NOT wrap around the ends or pads with zeroes that are then used to find the average as well. As a further treat is, that it also re-samples the signal at linearly spaced points. Customize the code at will to get other features.

The method is a simple matrix multiplication with a normalized Gaussian kernel.

def running_mean(y_in, x_in, N_out=101, sigma=1):
    '''
    Returns running mean as a Bell-curve weighted average at evenly spaced
    points. Does NOT wrap signal around, or pad with zeros.

    Arguments:
    y_in -- y values, the values to be smoothed and re-sampled
    x_in -- x values for array

    Keyword arguments:
    N_out -- NoOf elements in resampled array.
    sigma -- 'Width' of Bell-curve in units of param x .
    '''
    N_in = size(y_in)

    # Gaussian kernel
    x_out = np.linspace(np.min(x_in), np.max(x_in), N_out)
    x_in_mesh, x_out_mesh = np.meshgrid(x_in, x_out)
    gauss_kernel = np.exp(-np.square(x_in_mesh - x_out_mesh) / (2 * sigma**2))
    # Normalize kernel, such that the sum is one along axis 1
    normalization = np.tile(np.reshape(sum(gauss_kernel, axis=1), (N_out, 1)), (1, N_in))
    gauss_kernel_normalized = gauss_kernel / normalization
    # Perform running average as a linear operation
    y_out = gauss_kernel_normalized @ y_in

    return y_out, x_out

A simple usage on a sinusoidal signal with added normal distributed noise:

0 讨论(0)