Running or sliding median, mean and standard deviation

别等时光非礼了梦想. 提交于 2019-12-06 12:45:53

问题


I am trying to calculate the running median, mean and std of a large array. I know how to calculate the running mean as below:

def running_mean(x, N):
    cumsum = np.cumsum(np.insert(x, 0, 0))
    return (cumsum[N:] - cumsum[:-N]) / float(N)

This works very efficiently. But I do not quite understand why (cumsum[N:] - cumsum[:-N]) / float(N) can give the mean value (I borrowed from someome else).

I tried to add another return sentence to calculate the median, but it does not do what I want.

return (cumsum[N:] - cumsum[:-N]) / float(N), np.median(cumsum[N:] - cumsum[:-N])

Does anyone offer me some hint to approach this problem? Thank you very much.

Huanian Zhang


回答1:


That cumsum trick is specific to finding sum or average values and don't think you can extend it simply to get median and std values. One approach to perform a generic ufunc operation in a sliding/running window on a 1D array would be to create a series of 1D sliding windows-based indices stacked as a 2D array and then apply the ufunc along the stacking axis. For getting those indices, you can use broadcasting.

Thus, for performing running mean, it would look like this -

idx = np.arange(N) + np.arange(len(x)-N+1)[:,None]
out = np.mean(x[idx],axis=1)

For running median and std, just replace np.mean with np.median and np.std respectively.




回答2:


In order to estimate mean and standard deviation of a given sample set there exists incremental algorithms (std, mean) which helps you to keep the computational load low and do it online estimation. The computation of the median applies sorting. You can approximate the median. Let x(t) be your data at a given time t,m(t) the median of time t, m(t-1) the median value befor an e a small number e.g. e = 0.001 than

m(t) = m(t-1) + e, if m(t-1) < x(t)

m(t) = m(t-1) - e, if m(t-1) > x(t)

m(t) = m(t), else

If you have a good inital guess of the median m(0) this works well. e should be choosen in relation to your values range and how many samples expect. E.g. if x = [-4 2 7.5 2], e = 0.05 would be good, if x = [1000 , 3153, -586, -29], e = 10.




回答3:


Let me introduce a wrapper to get moving "anything":

import numpy as np

def runningFoo(operation):
    """ Make function that applies central running window operation
    """
    assert hasattr(np, operation), f"numpy has no method '{operation}'"

    method = getattr(np, operation)
    assert callable(method), f"numpy.{operation} is not callable"

    def closure(X, windowSize):
        assert windowSize % 2 == 1, "window size must be odd"
        assert windowSize <= len(X), "sequence must be longer than window"

        # setup index matrix
        half = windowSize // 2
        row = np.arange(windowSize) - half
        col = np.arange(len(X))
        index = row + col[:, None]

        # reflect boundaries
        row, col = np.triu_indices(half)
        upper = (row, half - 1 - col)
        index[upper] = np.abs(index[upper]) % len(X)
        lower = (len(X) - 1 - row, windowSize - 1 - upper[1])
        index[lower] = (len(X) - 2 - index[lower]) % len(X)

        return method(X[index], axis=1)

    return closure

For example, if you'd like to have running mean you may call runningFoo("mean"). Actually, you may call any appropriate method within NumPy. For example, runningFoo("max") will be a morphological dilation operation and runningFoo("min") will be a morphological erosion:

runningStd = runningFoo("std")
runningStd(np.arange(10), windowSize=3)

Make sure that window size is odd. Also, please note that boundary points are reflected.



来源:https://stackoverflow.com/questions/33585578/running-or-sliding-median-mean-and-standard-deviation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!