How to compute volatility (standard deviation) in rolling window in Pandas

前端 未结 4 2056
醉话见心
醉话见心 2021-02-08 04:23

I have a time series \"Ser\" and I want to compute volatilities (standard deviations) with a rolling window. My current code correctly does it in this form:

w=10         


        
相关标签:
4条回答
  • 2021-02-08 04:36

    "Volatility" is ambiguous even in a financial sense. The most commonly referenced type of volatility is realized volatility which is the square root of realized variance. The key differences from the standard deviation of returns are:

    • Log returns (not simple returns) are used
    • The figure is annualized (usually assuming between 252 and 260 trading days per year)
    • In the case Variance Swaps, log returns are not demeaned

    There are a variety of methods for computing realized volatility; however, I have implemented the two most common below:

    import numpy as np
    
    window = 21  # trading days in rolling window
    dpy = 252  # trading days per year
    ann_factor = days_per_year / window
    
    df['log_rtn'] = np.log(df['price']).diff()
    
    # Var Swap (returns are not demeaned)
    df['real_var'] = np.square(df['log_rtn']).rolling(window).sum() * ann_factor
    df['real_vol'] = np.sqrt(df['real_var'])
    
    # Classical (returns are demeaned, dof=1)
    df['real_var'] = df['log_rtn'].rolling(window).var() * ann_factor
    df['real_vol'] = np.sqrt(df['real_var'])
    
    0 讨论(0)
  • 2021-02-08 04:43

    It looks like you are looking for Series.rolling. You can apply the std calculations to the resulting object:

    roller = Ser.rolling(w)
    volList = roller.std(ddof=0)
    

    If you don't plan on using the rolling window object again, you can write a one-liner:

    volList = Ser.rolling(w).std(ddof=0)
    

    Keep in mind that ddof=0 is necessary in this case because the normalization of the standard deviation is by len(Ser)-ddof, and that ddof defaults to 1 in pandas.

    0 讨论(0)
  • 2021-02-08 04:46

    Typically, [finance-type] people quote volatility in annualized terms of percent changes in price.

    Assuming you have daily prices in a dataframe df and there are 252 trading days in a year, something like the following is probably what you want:

    df.pct_change().rolling(window_size).std()*(252**0.5)

    0 讨论(0)
  • 2021-02-08 05:03

    Here's one NumPy approach -

    # From http://stackoverflow.com/a/14314054/3293881 by @Jaime
    def moving_average(a, n=3) :
        ret = np.cumsum(a, dtype=float)
        ret[n:] = ret[n:] - ret[:-n]
        return ret[n - 1:] / n
    
    # From http://stackoverflow.com/a/40085052/3293881
    def strided_app(a, L, S=1 ):  # Window len = L, Stride len/stepsize = S
        nrows = ((a.size-L)//S)+1
        n = a.strides[0]
        return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
    
    def rolling_meansqdiff_numpy(a, w):
        A = strided_app(a, w)
        B = moving_average(a,w)
        subs = A-B[:,None]
        sums = np.einsum('ij,ij->i',subs,subs)
        return (sums/w)**0.5
    

    Sample run -

    In [202]: Ser = pd.Series(np.random.randint(0,9,(20)))
    
    In [203]: rolling_meansqdiff_loopy(Ser, w=10)
    Out[203]: 
    [2.6095976701399777,
     2.3000000000000003,
     2.118962010041709,
     2.022374841615669,
     1.746424919657298,
     1.7916472867168918,
     1.3000000000000003,
     1.7776388834631178,
     1.6852299546352716,
     1.6881943016134133,
     1.7578395831246945]
    
    In [204]: rolling_meansqdiff_numpy(Ser.values, w=10)
    Out[204]: 
    array([ 2.60959767,  2.3       ,  2.11896201,  2.02237484,  1.74642492,
            1.79164729,  1.3       ,  1.77763888,  1.68522995,  1.6881943 ,
            1.75783958])
    

    Runtime test

    Loopy approach -

    def rolling_meansqdiff_loopy(Ser, w):
        length = Ser.shape[0]- w + 1
        volList= []
        for timestep in range(length):
            subSer=Ser[timestep:timestep+w]
            mean_i=np.mean(subSer)
            vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
            volList.append(vol_i)
        return volList
    

    Timings -

    In [223]: Ser = pd.Series(np.random.randint(0,9,(10000)))
    
    In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10)
    1 loops, best of 3: 2.63 s per loop
    
    # @Mad Physicist's vectorized soln
    In [225]: %timeit Ser.rolling(10).std(ddof=0)
    1000 loops, best of 3: 380 µs per loop
    
    In [226]: %timeit rolling_meansqdiff_numpy(Ser.values, w=10)
    1000 loops, best of 3: 393 µs per loop
    

    A speedup of close to 7000x there with the two vectorized approaches over the loopy one!

    0 讨论(0)
提交回复
热议问题