问题
I hope this is the right place for my question.
I would like to understand how to use the 'hac-panel' cov_type when running sm.OLS. I have struggled with it the whole day but still cannot figure it out. Here is an example of my code (with data):
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pandas.tseries.offsets import *
# Just grabbing some random data here
dat = sm.datasets.macrodata.load_pandas().data
dat['time'] = dat['year'].apply(lambda x: pd.to_datetime(x, format='%Y'))
dat['time'] = dat.apply(lambda x:(x['time'] + BQuarterBegin(x['quarter'])), axis=1)
dat = dat.set_index('time')
dat = dat.sort_index()
dat['dGDP'] = (dat['realgdp'] - dat['realgdp'].shift(1))/dat['realgdp'].shift(1) * 100.0
dat['dM1'] = (dat['m1'] - dat['m1'].shift(1))/dat['m1'].shift(1) * 100.0
dat['dUEMP'] = dat['unemp'] - dat['unemp'].shift(1)
dat['dCPI'] = dat['infl'] - dat['infl'].shift(1)
dat = dat[['dGDP', 'dM1', 'dUEMP', 'dCPI']]
# Fitting the model
y_var = dat.unstack()
x_var = pd.DataFrame(dat.shift(1).unstack(), columns=['01m']).combine_first(pd.DataFrame(dat.shift(3).unstack(), columns=['03m'])).combine_first(pd.DataFrame(dat.shift(12).unstack(), columns=['12m']))
model = sm.OLS(y_var, sm.add_constant(x_var), missing='drop')
This works - which as far as I understand the docs it enforcing HAC cov. However, I am not sure if I am calling it correctly
res = model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'maxlags': 11})
res.summary()
Here is where I have a problem. Let's say I want to also cluster by time, which I think should be something like this:
model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11})
All help is really appreciated. Thank you very much in advance. Even pointing me to an example would be great - couldn't find anything.
I get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-74b3e662267b> in <module>
----> 1 model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11})
~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in fit(self, method, cov_type, cov_kwds, use_t, **kwargs)
343 self, beta,
344 normalized_cov_params=self.normalized_cov_params,
--> 345 cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t)
346 else:
347 lfit = RegressionResults(
~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in __init__(self, model, params, normalized_cov_params, scale, cov_type, cov_kwds, use_t, **kwargs)
1555 # TODO: warn or not?
1556 self.get_robustcov_results(cov_type=cov_type, use_self=True,
-> 1557 use_t=use_t, **cov_kwds)
1558 for key in kwargs:
1559 setattr(self, key, kwargs[key])
~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in get_robustcov_results(self, cov_type, use_t, **kwargs)
2490 res.cov_params_default = sw.cov_nw_panel(self, maxlags, groupidx,
2491 weights_func=weights_func,
-> 2492 use_correction=use_correction)
2493 res.cov_kwds['description'] = descriptions['HAC-Panel']
2494
~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in cov_nw_panel(results, nlags, groupidx, weights_func, use_correction)
785 xu, hessian_inv = _get_sandwich_arrays(results)
786
--> 787 S_hac = S_nw_panel(xu, weights, groupidx)
788 cov_hac = _HCCM2(hessian_inv, S_hac)
789 if use_correction:
~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in S_nw_panel(xw, weights, groupidx)
723 S = weights[0] * np.dot(xw.T, xw) #weights just for completeness
724 for lag in range(1, nlags+1):
--> 725 xw0, xwlag = lagged_groups(xw, lag, groupidx)
726 s = np.dot(xw0.T, xwlag)
727 S += weights[lag] * (s + s.T)
~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in lagged_groups(x, lag, groupidx)
706
707 if out0 == []:
--> 708 raise ValueError('all groups are empty taking lags')
709 #return out0, out_lagged
710 return np.vstack(out0), np.vstack(out_lagged)
ValueError: all groups are empty taking lags
回答1:
was looking for an example and your was very helpful.
Only problem with your code seems to be using the same time index in
cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11}
Basically, it accounts every unique unit from dat.index as a separate group, in your case every quarter. At the same time it uses that year as a time indicator so your group exists of all observations in the same quarter with time lenght one quarter. Since your time variable is just one period for each group there are no lags involved thus the error.
来源:https://stackoverflow.com/questions/61465483/python-statsmodels-robust-cov-type-hac-panel-issue