Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion

前端 未结 3 1941
一向
一向 2020-12-17 06:45

I have a large data frame, df, containing 4 columns:

             id           period  ret_1m   mkt_ret_1m
131146       CAN00WG0     199609 -0.1538    0.0471         


        
相关标签:
3条回答
  • 2020-12-17 06:55
    def rolling_apply(df, period, func, min_periods=None):
        if min_periods is None:
            min_periods = period
        result = pd.Series(np.nan, index=df.index)
    
        for i in range(1, len(df)):
            sub_df = df.iloc[max(i-period, 0):i,:] #get a subsample to run
            if len(sub_df) >= min_periods:
                idx = sub_df.index[-1]+1 # mind the forward looking bias,your return in time t should not be inclued in the beta calculating in time t
                result[idx] = func(sub_df)
        return result
    

    I fix a forward looking bias for Happy001's code. It's a finance problem, so it should be cautious.

    I find that vlmercado's answer is so wrong. If you simply use pd.rolling_cov and pd.rolling_var you are making mistakes in finance. Firstly, it's obvious that the second stock CAN00WH0 do not have any NaN beta, since it use the return of CAN00WG0, which is wrong at all. Secondly, consider such a situation: a stock suspended for ten years, and you can also get that sample into your beta calculating.

    I find that pandas.rolling also works for Timestamp, but it seems not ok with groupby. So I change the code of Happy001's code . It's not the fastest way, but is at least 20x faster than the origin code.

    crsp_daily['date']=pd.to_datetime(crsp_daily['date'])
    crsp_daily=crsp_daily.set_index('date') # rolling needs a time serie index
    crsp_daily.index=pd.DatetimeIndex(crsp_daily.index)
    calc=crsp_daily[['permno','ret','mkt_ret']]
    grp = calc.groupby('permno') #rolling beta for each stock
    beta=pd.DataFrame()
    for stock, sub_df in grp:
            sub2_df=sub_df[['ret','mkt_ret']].sort_index() 
            beta_m = sub2_df.rolling('1825d',min_periods=150).cov() # 5yr rolling beta , note that d for day, and you cannot use w/m/y, s/d are availiable.
            beta_m['beta']=beta_m['ret']/beta_m['mkt_ret']
            beta_m=beta_m.xs('mkt_ret',level=1,axis=0)
            beta=beta.append(pd.merge(sub_df,pd.DataFrame(beta_m['beta'])))
    beta=beta.reset_index()
    beta=beta[['date','permno','beta']]
    
    0 讨论(0)
  • 2020-12-17 07:02

    I guess pd.rolling_apply doesn't help in this case since it seems to me that it essentially only takes a Series (Even if a dataframe is passed, it's processing one column a time). But you can always write your own rolling_apply that takes a dataframe.

    import pandas as pd
    import numpy as np
    from StringIO import StringIO
    
    df = pd.read_csv(StringIO('''              id  period  ret_1m  mkt_ret_1m
    131146  CAN00WG0  199609 -0.1538    0.047104
    133530  CAN00WG0  199610 -0.0455   -0.014143
    135913  CAN00WG0  199611  0.0000    0.040926
    138334  CAN00WG0  199612  0.2952    0.008723
    140794  CAN00WG0  199701 -0.0257    0.039916
    143274  CAN00WG0  199702 -0.0038   -0.025442
    145754  CAN00WG0  199703 -0.2992   -0.049279
    148246  CAN00WG0  199704 -0.0919   -0.005948
    150774  CAN00WG0  199705  0.0595    0.122322
    153318  CAN00WG0  199706 -0.0337    0.045765
    160980  CAN00WH0  199709  0.0757    0.079293
    163569  CAN00WH0  199710 -0.0741   -0.044000
    166159  CAN00WH0  199711  0.1000   -0.014644
    168782  CAN00WH0  199712 -0.0909   -0.007072
    171399  CAN00WH0  199801 -0.0100    0.001381
    174022  CAN00WH0  199802  0.1919    0.081924
    176637  CAN00WH0  199803  0.0085    0.050415
    179255  CAN00WH0  199804 -0.0168    0.018393
    181880  CAN00WH0  199805  0.0427   -0.051279
    184516  CAN00WH0  199806 -0.0656   -0.011516
    143275  CAN00WO0  199702 -0.1176   -0.025442
    145755  CAN00WO0  199703 -0.0074   -0.049279
    148247  CAN00WO0  199704 -0.0075   -0.005948
    150775  CAN00WO0  199705  0.0451    0.122322'''), sep='\s+')
    
    
    
    def calc_beta(df):
        np_array = df.values
        s = np_array[:,0] # stock returns are column zero from numpy array
        m = np_array[:,1] # market returns are column one from numpy array
    
        covariance = np.cov(s,m) # Calculate covariance between stock and market
        beta = covariance[0,1]/covariance[1,1]
        return beta
    
    def rolling_apply(df, period, func, min_periods=None):
        if min_periods is None:
            min_periods = period
        result = pd.Series(np.nan, index=df.index)
    
        for i in range(1, len(df)+1):
            sub_df = df.iloc[max(i-period, 0):i,:] #I edited here
            if len(sub_df) >= min_periods:
                idx = sub_df.index[-1]
                result[idx] = func(sub_df)
        return result
    
    df['beta'] = np.nan
    grp = df.groupby('id')
    period = 6 #I'm using 6  to see some not NaN values, since sample data don't have longer than 12 groups
    for stock, sub_df in grp:
        beta = rolling_apply(sub_df[['ret_1m','mkt_ret_1m']], period, calc_beta, min_periods = period)  
        beta.name = 'beta'
        df.update(beta)
    print df
    

    Output

                id  period  ret_1m  mkt_ret_1m      beta
    131146  CAN00WG0  199609 -0.1538    0.047104       NaN
    133530  CAN00WG0  199610 -0.0455   -0.014143       NaN
    135913  CAN00WG0  199611  0.0000    0.040926       NaN
    138334  CAN00WG0  199612  0.2952    0.008723       NaN
    140794  CAN00WG0  199701 -0.0257    0.039916       NaN
    143274  CAN00WG0  199702 -0.0038   -0.025442 -1.245908
    145754  CAN00WG0  199703 -0.2992   -0.049279  2.574464
    148246  CAN00WG0  199704 -0.0919   -0.005948  2.657887
    150774  CAN00WG0  199705  0.0595    0.122322  1.371090
    153318  CAN00WG0  199706 -0.0337    0.045765  1.494095
    ...          ...     ...     ...         ...       ...
    171399  CAN00WH0  199801 -0.0100    0.001381       NaN
    174022  CAN00WH0  199802  0.1919    0.081924  1.542782
    176637  CAN00WH0  199803  0.0085    0.050415  1.605407
    179255  CAN00WH0  199804 -0.0168    0.018393  1.571015
    181880  CAN00WH0  199805  0.0427   -0.051279  1.139972
    184516  CAN00WH0  199806 -0.0656   -0.011516  1.101890
    143275  CAN00WO0  199702 -0.1176   -0.025442       NaN
    145755  CAN00WO0  199703 -0.0074   -0.049279       NaN
    148247  CAN00WO0  199704 -0.0075   -0.005948       NaN
    150775  CAN00WO0  199705  0.0451    0.122322       NaN
    
    0 讨论(0)
  • 2020-12-17 07:04

    Try pd.rolling_cov() and pd.rolling.var() as follows:

    import pandas as pd
    import numpy as np
    from StringIO import StringIO
    
        df = pd.read_csv(StringIO('''              id  period  ret_1m  mkt_ret_1m
        131146  CAN00WG0  199609 -0.1538    0.047104
        133530  CAN00WG0  199610 -0.0455   -0.014143
        135913  CAN00WG0  199611  0.0000    0.040926
        138334  CAN00WG0  199612  0.2952    0.008723
        140794  CAN00WG0  199701 -0.0257    0.039916
        143274  CAN00WG0  199702 -0.0038   -0.025442
        145754  CAN00WG0  199703 -0.2992   -0.049279
        148246  CAN00WG0  199704 -0.0919   -0.005948
        150774  CAN00WG0  199705  0.0595    0.122322
        153318  CAN00WG0  199706 -0.0337    0.045765
        160980  CAN00WH0  199709  0.0757    0.079293
        163569  CAN00WH0  199710 -0.0741   -0.044000
        166159  CAN00WH0  199711  0.1000   -0.014644
        168782  CAN00WH0  199712 -0.0909   -0.007072
        171399  CAN00WH0  199801 -0.0100    0.001381
        174022  CAN00WH0  199802  0.1919    0.081924
        176637  CAN00WH0  199803  0.0085    0.050415
        179255  CAN00WH0  199804 -0.0168    0.018393
        181880  CAN00WH0  199805  0.0427   -0.051279
        184516  CAN00WH0  199806 -0.0656   -0.011516
        143275  CAN00WO0  199702 -0.1176   -0.025442
        145755  CAN00WO0  199703 -0.0074   -0.049279
        148247  CAN00WO0  199704 -0.0075   -0.005948
        150775  CAN00WO0  199705  0.0451    0.122322'''), sep='\s+')
    
        df['beta'] = pd.rolling_cov(df['ret_1m'], df['mkt_ret_1m'], window=6) / pd.rolling_var(df['mkt_ret_1m'], window=6)
    
    print df
    

    Output:

                  id  period  ret_1m  mkt_ret_1m      beta
    131146  CAN00WG0  199609 -0.1538    0.047104       NaN
    133530  CAN00WG0  199610 -0.0455   -0.014143       NaN
    135913  CAN00WG0  199611  0.0000    0.040926       NaN
    138334  CAN00WG0  199612  0.2952    0.008723       NaN
    140794  CAN00WG0  199701 -0.0257    0.039916       NaN
    143274  CAN00WG0  199702 -0.0038   -0.025442 -1.245908
    145754  CAN00WG0  199703 -0.2992   -0.049279  2.574464
    148246  CAN00WG0  199704 -0.0919   -0.005948  2.657887
    150774  CAN00WG0  199705  0.0595    0.122322  1.371090
    153318  CAN00WG0  199706 -0.0337    0.045765  1.494095
    160980  CAN00WH0  199709  0.0757    0.079293  1.616520
    163569  CAN00WH0  199710 -0.0741   -0.044000  1.630411
    166159  CAN00WH0  199711  0.1000   -0.014644  0.651220
    168782  CAN00WH0  199712 -0.0909   -0.007072  0.652148
    171399  CAN00WH0  199801 -0.0100    0.001381  0.724120
    174022  CAN00WH0  199802  0.1919    0.081924  1.542782
    176637  CAN00WH0  199803  0.0085    0.050415  1.605407
    179255  CAN00WH0  199804 -0.0168    0.018393  1.571015
    181880  CAN00WH0  199805  0.0427   -0.051279  1.139972
    184516  CAN00WH0  199806 -0.0656   -0.011516  1.101890
    143275  CAN00WO0  199702 -0.1176   -0.025442  1.372437
    145755  CAN00WO0  199703 -0.0074   -0.049279  0.031939
    148247  CAN00WO0  199704 -0.0075   -0.005948 -0.535855
    150775  CAN00WO0  199705  0.0451    0.122322  0.341747
    
    0 讨论(0)
提交回复
热议问题