Moving average or running mean

后端未结

关注

 27  1074

Is there a SciPy function or NumPy function or module for Python that calculates the running mean of a 1D array given a specific window?

相关标签:

27条回答

别跟我提以往

2020-11-22 09:15
Efficient solution

Convolution is much better than straightforward approach, but (I guess) it uses FFT and thus quite slow. However specially for computing the running mean the following approach works fine
```
def running_mean(x, N):
    cumsum = numpy.cumsum(numpy.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / float(N)
```
The code to check
```
In[3]: x = numpy.random.random(100000)
In[4]: N = 1000
In[5]: %timeit result1 = numpy.convolve(x, numpy.ones((N,))/N, mode='valid')
10 loops, best of 3: 41.4 ms per loop
In[6]: %timeit result2 = running_mean(x, N)
1000 loops, best of 3: 1.04 ms per loop
```
Note that numpy.allclose(result1, result2) is True, two methods are equivalent. The greater N, the greater difference in time.

warning: although cumsum is faster there will be increased floating point error that may cause your results to be invalid/incorrect/unacceptable

the comments pointed out this floating point error issue here but i am making it more obvious here in the answer..
```
# demonstrate loss of precision with only 100,000 points
np.random.seed(42)
x = np.random.randn(100000)+1e6
y1 = running_mean_convolve(x, 10)
y2 = running_mean_cumsum(x, 10)
assert np.allclose(y1, y2, rtol=1e-12, atol=0)
```
- the more points you accumulate over the greater the floating point error (so 1e5 points is noticable, 1e6 points is more significant, more than 1e6 and you may want to resetting the accumulators)
- you can cheat by using np.longdouble but your floating point error still will get significant for relatively large number of points (around >1e5 but depends on your data)
- you can plot the error and see it increasing relatively fast
- the convolve solution is slower but does not have this floating point loss of precision
- the uniform_filter1d solution is faster than this cumsum solution AND does not have this floating point loss of precision
0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2020-11-22 09:15
I know this is an old question, but here is a solution that doesn't use any extra data structures or libraries. It is linear in the number of elements of the input list and I cannot think of any other way to make it more efficient (actually if anyone knows of a better way to allocate the result, please let me know).

NOTE: this would be much faster using a numpy array instead of a list, but I wanted to eliminate all dependencies. It would also be possible to improve performance by multi-threaded execution

The function assumes that the input list is one dimensional, so be careful.
```
### Running mean/Moving average
def running_mean(l, N):
    sum = 0
    result = list( 0 for x in l)

    for i in range( 0, N ):
        sum = sum + l[i]
        result[i] = sum / (i+1)

    for i in range( N, len(l) ):
        sum = sum - l[i-N] + l[i]
        result[i] = sum / N

    return result
```
Example

Assume that we have a list data = [ 1, 2, 3, 4, 5, 6 ] on which we want to compute a rolling mean with period of 3, and that you also want a output list that is the same size of the input one (that's most often the case).

The first element has index 0, so the rolling mean should be computed on elements of index -2, -1 and 0. Obviously we don't have data[-2] and data[-1] (unless you want to use special boundary conditions), so we assume that those elements are 0. This is equivalent to zero-padding the list, except we don't actually pad it, just keep track of the indices that require padding (from 0 to N-1).

So, for the first N elements we just keep adding up the elements in an accumulator.
```
result[0] = (0 + 0 + 1) / 3  = 0.333    ==   (sum + 1) / 3
result[1] = (0 + 1 + 2) / 3  = 1        ==   (sum + 2) / 3
result[2] = (1 + 2 + 3) / 3  = 2        ==   (sum + 3) / 3
```
From elements N+1 forwards simple accumulation doesn't work. we expect result[3] = (2 + 3 + 4)/3 = 3 but this is different from (sum + 4)/3 = 3.333.

The way to compute the correct value is to subtract data[0] = 1 from sum+4, thus giving sum + 4 - 1 = 9.

This happens because currently sum = data[0] + data[1] + data[2], but it is also true for every i >= N because, before the subtraction, sum is data[i-N] + ... + data[i-2] + data[i-1].
0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-11-22 09:16
Another approach to find moving average without using numpy, panda
```
import itertools
sample = [2, 6, 10, 8, 11, 10]
list(itertools.starmap(lambda a,b: b/a, 
               enumerate(itertools.accumulate(sample), 1)))
```
will print [2.0, 4.0, 6.0, 6.5, 7.4, 7.833333333333333]
0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-11-22 09:16
All the aforementioned solutions are poor because they lack
- speed due to a native python instead of a numpy vectorized implementation,
- numerical stability due to poor use of numpy.cumsum, or
- speed due to O(len(x) * w) implementations as convolutions.
Given
```
import numpy
m = 10000
x = numpy.random.rand(m)
w = 1000
```
Note that x_[:w].sum() equals x[:w-1].sum(). So for the first average the numpy.cumsum(...) adds x[w] / w (via x_[w+1] / w), and subtracts 0 (from x_[0] / w). This results in x[0:w].mean()

Via cumsum, you'll update the second average by additionally add x[w+1] / w and subtracting x[0] / w, resulting in x[1:w+1].mean().

This goes on until x[-w:].mean() is reached.
```
x_ = numpy.insert(x, 0, 0)
sliding_average = x_[:w].sum() / w + numpy.cumsum(x_[w:] - x_[:-w]) / w
```
This solution is vectorized, O(m), readable and numerically stable.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Happy的楠姐

2020-11-22 09:16

Another solution just using a standard library and deque:

from collections import deque
import itertools

def moving_average(iterable, n=3):
    # http://en.wikipedia.org/wiki/Moving_average
    it = iter(iterable) 
    # create an iterable object from input argument
    d = deque(itertools.islice(it, n-1))  
    # create deque object by slicing iterable
    d.appendleft(0)
    s = sum(d)
    for elem in it:
        s += elem - d.popleft()
        d.append(elem)
        yield s / n

# example on how to use it
for i in  moving_average([40, 30, 50, 46, 39, 44]):
    print(i)

# 40.0
# 42.0
# 45.0
# 43.0

0 讨论(0)

攒了一身酷

2020-11-22 09:16
How about a moving average filter? It is also a one-liner and has the advantage, that you can easily manipulate the window type if you need something else than the rectangle, ie. a N-long simple moving average of an array a:
```
lfilter(np.ones(N)/N, [1], a)[N:]
```
And with the triangular window applied:
```
lfilter(np.ones(N)*scipy.signal.triang(N)/N, [1], a)[N:]
```
Note: I usually discard the first N samples as bogus hence [N:] at the end, but it is not necessary and the matter of a personal choice only.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Moving average or running mean

Efficient solution

warning: although cumsum is faster there will be increased floating point error that may cause your results to be invalid/incorrect/unacceptable