Is there a SciPy function or NumPy function or module for Python that calculates the running mean of a 1D array given a specific window?
Convolution is much better than straightforward approach, but (I guess) it uses FFT and thus quite slow. However specially for computing the running mean the following approach works fine
def running_mean(x, N):
cumsum = numpy.cumsum(numpy.insert(x, 0, 0))
return (cumsum[N:] - cumsum[:-N]) / float(N)
The code to check
In[3]: x = numpy.random.random(100000)
In[4]: N = 1000
In[5]: %timeit result1 = numpy.convolve(x, numpy.ones((N,))/N, mode='valid')
10 loops, best of 3: 41.4 ms per loop
In[6]: %timeit result2 = running_mean(x, N)
1000 loops, best of 3: 1.04 ms per loop
Note that numpy.allclose(result1, result2)
is True
, two methods are equivalent.
The greater N, the greater difference in time.
the comments pointed out this floating point error issue here but i am making it more obvious here in the answer..
# demonstrate loss of precision with only 100,000 points
np.random.seed(42)
x = np.random.randn(100000)+1e6
y1 = running_mean_convolve(x, 10)
y2 = running_mean_cumsum(x, 10)
assert np.allclose(y1, y2, rtol=1e-12, atol=0)
I know this is an old question, but here is a solution that doesn't use any extra data structures or libraries. It is linear in the number of elements of the input list and I cannot think of any other way to make it more efficient (actually if anyone knows of a better way to allocate the result, please let me know).
NOTE: this would be much faster using a numpy array instead of a list, but I wanted to eliminate all dependencies. It would also be possible to improve performance by multi-threaded execution
The function assumes that the input list is one dimensional, so be careful.
### Running mean/Moving average
def running_mean(l, N):
sum = 0
result = list( 0 for x in l)
for i in range( 0, N ):
sum = sum + l[i]
result[i] = sum / (i+1)
for i in range( N, len(l) ):
sum = sum - l[i-N] + l[i]
result[i] = sum / N
return result
Example
Assume that we have a list data = [ 1, 2, 3, 4, 5, 6 ]
on which we want to compute a rolling mean with period of 3, and that you also want a output list that is the same size of the input one (that's most often the case).
The first element has index 0, so the rolling mean should be computed on elements of index -2, -1 and 0. Obviously we don't have data[-2] and data[-1] (unless you want to use special boundary conditions), so we assume that those elements are 0. This is equivalent to zero-padding the list, except we don't actually pad it, just keep track of the indices that require padding (from 0 to N-1).
So, for the first N elements we just keep adding up the elements in an accumulator.
result[0] = (0 + 0 + 1) / 3 = 0.333 == (sum + 1) / 3
result[1] = (0 + 1 + 2) / 3 = 1 == (sum + 2) / 3
result[2] = (1 + 2 + 3) / 3 = 2 == (sum + 3) / 3
From elements N+1 forwards simple accumulation doesn't work. we expect result[3] = (2 + 3 + 4)/3 = 3
but this is different from (sum + 4)/3 = 3.333
.
The way to compute the correct value is to subtract data[0] = 1
from sum+4
, thus giving sum + 4 - 1 = 9
.
This happens because currently sum = data[0] + data[1] + data[2]
, but it is also true for every i >= N
because, before the subtraction, sum
is data[i-N] + ... + data[i-2] + data[i-1]
.
Another approach to find moving average without using numpy, panda
import itertools
sample = [2, 6, 10, 8, 11, 10]
list(itertools.starmap(lambda a,b: b/a,
enumerate(itertools.accumulate(sample), 1)))
will print [2.0, 4.0, 6.0, 6.5, 7.4, 7.833333333333333]
All the aforementioned solutions are poor because they lack
numpy.cumsum
, orO(len(x) * w)
implementations as convolutions.Given
import numpy
m = 10000
x = numpy.random.rand(m)
w = 1000
Note that x_[:w].sum()
equals x[:w-1].sum()
. So for the first average the numpy.cumsum(...)
adds x[w] / w
(via x_[w+1] / w
), and subtracts 0
(from x_[0] / w
). This results in x[0:w].mean()
Via cumsum, you'll update the second average by additionally add x[w+1] / w
and subtracting x[0] / w
, resulting in x[1:w+1].mean()
.
This goes on until x[-w:].mean()
is reached.
x_ = numpy.insert(x, 0, 0)
sliding_average = x_[:w].sum() / w + numpy.cumsum(x_[w:] - x_[:-w]) / w
This solution is vectorized, O(m)
, readable and numerically stable.
Another solution just using a standard library and deque:
from collections import deque
import itertools
def moving_average(iterable, n=3):
# http://en.wikipedia.org/wiki/Moving_average
it = iter(iterable)
# create an iterable object from input argument
d = deque(itertools.islice(it, n-1))
# create deque object by slicing iterable
d.appendleft(0)
s = sum(d)
for elem in it:
s += elem - d.popleft()
d.append(elem)
yield s / n
# example on how to use it
for i in moving_average([40, 30, 50, 46, 39, 44]):
print(i)
# 40.0
# 42.0
# 45.0
# 43.0
How about a moving average filter? It is also a one-liner and has the advantage, that you can easily manipulate the window type if you need something else than the rectangle, ie. a N-long simple moving average of an array a:
lfilter(np.ones(N)/N, [1], a)[N:]
And with the triangular window applied:
lfilter(np.ones(N)*scipy.signal.triang(N)/N, [1], a)[N:]
Note: I usually discard the first N samples as bogus hence [N:]
at the end, but it is not necessary and the matter of a personal choice only.