How to compute volatility (standard deviation) in rolling window in Pandas

前端未结

关注

 4  2058

I have a time series \"Ser\" and I want to compute volatilities (standard deviations) with a rolling window. My current code correctly does it in this form:

w=10


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2021-02-08 04:36
              
            
            
                                                                       
"Volatility" is ambiguous even in a financial sense.  The most commonly referenced type of volatility is realized volatility which is the square root of realized variance.  The key differences from the standard deviation of returns are:


Log returns (not simple returns) are used
The figure is annualized (usually assuming between 252 and 260 trading days per year)
In the case Variance Swaps, log returns are not demeaned


There are a variety of methods for computing realized volatility; however, I have implemented the two most common below:

import numpy as np

window = 21  # trading days in rolling window
dpy = 252  # trading days per year
ann_factor = days_per_year / window

df['log_rtn'] = np.log(df['price']).diff()

# Var Swap (returns are not demeaned)
df['real_var'] = np.square(df['log_rtn']).rolling(window).sum() * ann_factor
df['real_vol'] = np.sqrt(df['real_var'])

# Classical (returns are demeaned, dof=1)
df['real_var'] = df['log_rtn'].rolling(window).var() * ann_factor
df['real_vol'] = np.sqrt(df['real_var'])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-02-08 04:43
              
            
            
                                                                       
It looks like you are looking for Series.rolling. You can apply the std calculations to the resulting object:

roller = Ser.rolling(w)
volList = roller.std(ddof=0)


If you don't plan on using the rolling window object again, you can write a one-liner:

volList = Ser.rolling(w).std(ddof=0)


Keep in mind that ddof=0 is necessary in this case because the normalization of the standard deviation is by len(Ser)-ddof, and that ddof defaults to 1 in pandas.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2021-02-08 04:46
              
            
            
                                                                       
Typically, [finance-type] people quote volatility in annualized terms of percent changes in price. 

Assuming you have daily prices in a dataframe df and there are 252 trading days in a year, something like the following is probably what you want:

df.pct_change().rolling(window_size).std()*(252**0.5) 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2021-02-08 05:03
              
            
            
                                                                       
Here's one NumPy approach -

# From http://stackoverflow.com/a/14314054/3293881 by @Jaime
def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

# From http://stackoverflow.com/a/40085052/3293881
def strided_app(a, L, S=1 ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def rolling_meansqdiff_numpy(a, w):
    A = strided_app(a, w)
    B = moving_average(a,w)
    subs = A-B[:,None]
    sums = np.einsum('ij,ij->i',subs,subs)
    return (sums/w)**0.5


Sample run -

In [202]: Ser = pd.Series(np.random.randint(0,9,(20)))

In [203]: rolling_meansqdiff_loopy(Ser, w=10)
Out[203]: 
[2.6095976701399777,
 2.3000000000000003,
 2.118962010041709,
 2.022374841615669,
 1.746424919657298,
 1.7916472867168918,
 1.3000000000000003,
 1.7776388834631178,
 1.6852299546352716,
 1.6881943016134133,
 1.7578395831246945]

In [204]: rolling_meansqdiff_numpy(Ser.values, w=10)
Out[204]: 
array([ 2.60959767,  2.3       ,  2.11896201,  2.02237484,  1.74642492,
        1.79164729,  1.3       ,  1.77763888,  1.68522995,  1.6881943 ,
        1.75783958])


Runtime test

Loopy approach -

def rolling_meansqdiff_loopy(Ser, w):
    length = Ser.shape[0]- w + 1
    volList= []
    for timestep in range(length):
        subSer=Ser[timestep:timestep+w]
        mean_i=np.mean(subSer)
        vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
        volList.append(vol_i)
    return volList


Timings -

In [223]: Ser = pd.Series(np.random.randint(0,9,(10000)))

In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10)
1 loops, best of 3: 2.63 s per loop

# @Mad Physicist's vectorized soln
In [225]: %timeit Ser.rolling(10).std(ddof=0)
1000 loops, best of 3: 380 µs per loop

In [226]: %timeit rolling_meansqdiff_numpy(Ser.values, w=10)
1000 loops, best of 3: 393 µs per loop


A speedup of close to 7000x there with the two vectorized approaches over the loopy one!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复