How to compute volatility (standard deviation) in rolling window in Pandas

前端 未结 4 2057
醉话见心
醉话见心 2021-02-08 04:23

I have a time series \"Ser\" and I want to compute volatilities (standard deviations) with a rolling window. My current code correctly does it in this form:

w=10         


        
4条回答
  •  难免孤独
    2021-02-08 05:03

    Here's one NumPy approach -

    # From http://stackoverflow.com/a/14314054/3293881 by @Jaime
    def moving_average(a, n=3) :
        ret = np.cumsum(a, dtype=float)
        ret[n:] = ret[n:] - ret[:-n]
        return ret[n - 1:] / n
    
    # From http://stackoverflow.com/a/40085052/3293881
    def strided_app(a, L, S=1 ):  # Window len = L, Stride len/stepsize = S
        nrows = ((a.size-L)//S)+1
        n = a.strides[0]
        return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
    
    def rolling_meansqdiff_numpy(a, w):
        A = strided_app(a, w)
        B = moving_average(a,w)
        subs = A-B[:,None]
        sums = np.einsum('ij,ij->i',subs,subs)
        return (sums/w)**0.5
    

    Sample run -

    In [202]: Ser = pd.Series(np.random.randint(0,9,(20)))
    
    In [203]: rolling_meansqdiff_loopy(Ser, w=10)
    Out[203]: 
    [2.6095976701399777,
     2.3000000000000003,
     2.118962010041709,
     2.022374841615669,
     1.746424919657298,
     1.7916472867168918,
     1.3000000000000003,
     1.7776388834631178,
     1.6852299546352716,
     1.6881943016134133,
     1.7578395831246945]
    
    In [204]: rolling_meansqdiff_numpy(Ser.values, w=10)
    Out[204]: 
    array([ 2.60959767,  2.3       ,  2.11896201,  2.02237484,  1.74642492,
            1.79164729,  1.3       ,  1.77763888,  1.68522995,  1.6881943 ,
            1.75783958])
    

    Runtime test

    Loopy approach -

    def rolling_meansqdiff_loopy(Ser, w):
        length = Ser.shape[0]- w + 1
        volList= []
        for timestep in range(length):
            subSer=Ser[timestep:timestep+w]
            mean_i=np.mean(subSer)
            vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
            volList.append(vol_i)
        return volList
    

    Timings -

    In [223]: Ser = pd.Series(np.random.randint(0,9,(10000)))
    
    In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10)
    1 loops, best of 3: 2.63 s per loop
    
    # @Mad Physicist's vectorized soln
    In [225]: %timeit Ser.rolling(10).std(ddof=0)
    1000 loops, best of 3: 380 µs per loop
    
    In [226]: %timeit rolling_meansqdiff_numpy(Ser.values, w=10)
    1000 loops, best of 3: 393 µs per loop
    

    A speedup of close to 7000x there with the two vectorized approaches over the loopy one!

提交回复
热议问题