问题
In R, how can I calculate robust standard errors using vcovHC() when some coefficients are dropped due to singularities? The standard lm function seems to do fine calculating normal standard errors for all coefficients that are actually estimated, but vcovHC() throws an error: "Error in bread. %*% meat. : non-conformable arguments".
(The actual data I'm using is a bit more complicated. In fact, it is a model using two different fixed effects and I run into local singularities which I cannot simply get rid of. At least I would not know how. For the two fixed effects I'm using the first factor has 150 levels, the second has 142 levels and there are in total 9 singularities which result from the fact that the data was collected in ten blocks.)
Here is my output:
Call:
lm(formula = one ~ two + three + Jan + Feb + Mar + Apr + May +
Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)
Residuals:
Min 1Q Median 3Q Max
-130.12 -60.95 0.08 61.05 137.35
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1169.74313 57.36807 20.390 <2e-16 ***
two -0.07963 0.06720 -1.185 0.237
three -0.04053 0.06686 -0.606 0.545
Jan 8.10336 22.05552 0.367 0.714
Feb 0.44025 22.11275 0.020 0.984
Mar 19.65066 22.02454 0.892 0.373
Apr -13.19779 22.02886 -0.599 0.550
May 15.39534 22.10445 0.696 0.487
Jun -12.50227 22.07013 -0.566 0.572
Jul -20.58648 22.06772 -0.933 0.352
Aug -0.72223 22.36923 -0.032 0.974
Sep 12.42204 22.09296 0.562 0.574
Oct 25.14836 22.04324 1.141 0.255
Nov 18.13337 22.08717 0.821 0.413
Dec NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 69.63 on 226 degrees of freedom
Multiple R-squared: 0.04878, Adjusted R-squared: -0.005939
F-statistic: 0.8914 on 13 and 226 DF, p-value: 0.5629
> model$se <- vcovHC(model)
Error in bread. %*% meat. : non-conformable arguments
Here is a minimal code snipped to reproduce the error.
library(sandwich)
set.seed(101)
dat<-data.frame(one=c(sample(1000:1239)),
two=c(sample(200:439)),
three=c(sample(600:839)),
Jan=c(rep(1,20),rep(0,220)),
Feb=c(rep(0,20),rep(1,20),rep(0,200)),
Mar=c(rep(0,40),rep(1,20),rep(0,180)),
Apr=c(rep(0,60),rep(1,20),rep(0,160)),
May=c(rep(0,80),rep(1,20),rep(0,140)),
Jun=c(rep(0,100),rep(1,20),rep(0,120)),
Jul=c(rep(0,120),rep(1,20),rep(0,100)),
Aug=c(rep(0,140),rep(1,20),rep(0,80)),
Sep=c(rep(0,160),rep(1,20),rep(0,60)),
Oct=c(rep(0,180),rep(1,20),rep(0,40)),
Nov=c(rep(0,200),rep(1,20),rep(0,20)),
Dec=c(rep(0,220),rep(1,20)))
model <- lm(one ~ two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
summary(model)
model$se <- vcovHC(model)
回答1:
Models with singularities are never good and they should be fixed. In your case, you have 12 coefficients for 12 month, but also the global intercept! So you have actually 13 coefficients for only 12 real parameters to be estimated. What you actually want is to disable the global intercept - so you will have something more like month-specific intercept:
> model <- lm(one ~ 0 + two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
> summary(model)
Call:
lm(formula = one ~ 0 + two + three + Jan + Feb + Mar + Apr +
May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)
Residuals:
Min 1Q Median 3Q Max
-133.817 -55.636 3.329 56.768 126.772
Coefficients:
Estimate Std. Error t value Pr(>|t|)
two -0.09670 0.06621 -1.460 0.146
three 0.02446 0.06666 0.367 0.714
Jan 1130.05812 52.79625 21.404 <2e-16 ***
Feb 1121.32904 55.18864 20.318 <2e-16 ***
Mar 1143.50310 53.59603 21.336 <2e-16 ***
Apr 1143.95365 54.99724 20.800 <2e-16 ***
May 1136.36429 53.38218 21.287 <2e-16 ***
Jun 1129.86010 53.85865 20.978 <2e-16 ***
Jul 1105.10045 54.94940 20.111 <2e-16 ***
Aug 1147.47152 54.57201 21.027 <2e-16 ***
Sep 1139.42205 53.58611 21.263 <2e-16 ***
Oct 1117.75075 55.35703 20.192 <2e-16 ***
Nov 1129.20208 53.54934 21.087 <2e-16 ***
Dec 1149.55556 53.52499 21.477 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared: 0.9964, Adjusted R-squared: 0.9961
F-statistic: 4409 on 14 and 226 DF, p-value: < 2.2e-16
Then, it is a normal model so you shouldn't have any problems with vcovHC.
回答2:
What you seem to be aiming at is a fixed effects estimation, though this question was raised a while ago I ran into the same problem, here is my solution:
Fixed effects can be controlled for by including a + factor()
in your estimation equation:
So I created an additional column first:
# create an addtitional column in your data
dat$month <- "0"
#this column will contain the month, not a dummy for months
for (i in 1:length(dat$one)){
if (dat[i,"Jan"]==1){
dat[i,"month"]<- "Jan"}
if (dat[i,"Feb"]==1){
dat[i,"month"]<- "Feb"}
if (dat[i,"Mar"]==1){
dat[i,"month"]<- "Mar"}
if (dat[i,"Apr"]==1){
dat[i,"month"]<- "Apr"}
if (dat[i,"May"]==1){
dat[i,"month"]<- "May"}
if (dat[i,"Jun"]==1){
dat[i,"month"]<- "Jun"}
if (dat[i,"Jul"]==1){
dat[i,"month"]<- "Jul"}
if (dat[i,"Aug"]==1){
dat[i,"month"]<- "Aug"}
if (dat[i,"Sep"]==1){
dat[i,"month"]<- "Sep"}
if (dat[i,"Oct"]==1){
dat[i,"month"]<- "Oct"}
if (dat[i,"Nov"]==1){
dat[i,"month"]<- "Nov"}
if (dat[i,"Dec"]==1){
dat[i,"month"]<- "Dec"}
}
i <- NULL
This column now can be used as the fixed or constant effect factor in the regression equation:
> #you can use the created column as fixed effect factor in your
+ regression
> model_A <- lm(one ~ two + three + factor(month), data=dat)
> summary(model_A)
Call:
lm(formula = one ~ two + three + factor(month), data = dat)
Residuals:
Min 1Q Median 3Q Max
-133.817 -55.636 3.329 56.768 126.772
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1143.95365 54.99724 20.800 <2e-16 ***
two -0.09670 0.06621 -1.460 0.1455
three 0.02446 0.06666 0.367 0.7141
factor(month)Aug 3.51788 22.09948 0.159 0.8737
factor(month)Dec 5.60192 22.41204 0.250 0.8029
factor(month)Feb -22.62460 22.10889 -1.023 0.3072
factor(month)Jan -13.89553 22.25117 -0.624 0.5329
factor(month)Jul -38.85320 22.13980 -1.755 0.0806 .
factor(month)Jun -14.09355 22.18707 -0.635 0.5259
factor(month)Mar -0.45055 22.13638 -0.020 0.9838
factor(month)May -7.58935 22.14137 -0.343 0.7321
factor(month)Nov -14.75156 22.27288 -0.662 0.5084
factor(month)Oct -26.20290 22.09416 -1.186 0.2369
factor(month)Sep -4.53159 22.26334 -0.204 0.8389
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared: 0.04381, Adjusted R-squared: -0.01119
F-statistic: 0.7966 on 13 and 226 DF, p-value: 0.6635
> #and also do the same without intercept if so needed
> model_B <- lm(one ~ 0 + two + three + factor(month), data=dat)
> summary(model_B)
Call:
lm(formula = one ~ 0 + two + three + factor(month), data = dat)
Residuals:
Min 1Q Median 3Q Max
-133.817 -55.636 3.329 56.768 126.772
Coefficients:
Estimate Std. Error t value Pr(>|t|)
two -0.09670 0.06621 -1.460 0.146
three 0.02446 0.06666 0.367 0.714
factor(month)Apr 1143.95365 54.99724 20.800 <2e-16 ***
factor(month)Aug 1147.47152 54.57201 21.027 <2e-16 ***
factor(month)Dec 1149.55556 53.52499 21.477 <2e-16 ***
factor(month)Feb 1121.32904 55.18864 20.318 <2e-16 ***
factor(month)Jan 1130.05812 52.79625 21.404 <2e-16 ***
factor(month)Jul 1105.10045 54.94940 20.111 <2e-16 ***
factor(month)Jun 1129.86010 53.85865 20.978 <2e-16 ***
factor(month)Mar 1143.50310 53.59603 21.336 <2e-16 ***
factor(month)May 1136.36429 53.38218 21.287 <2e-16 ***
factor(month)Nov 1129.20208 53.54934 21.087 <2e-16 ***
factor(month)Oct 1117.75075 55.35703 20.192 <2e-16 ***
factor(month)Sep 1139.42205 53.58611 21.263 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared: 0.9964, Adjusted R-squared: 0.9961
F-statistic: 4409 on 14 and 226 DF, p-value: < 2.2e-16
This lets you run a regular OLS regression on panel data.
来源:https://stackoverflow.com/questions/9335621/r-calculate-robust-standard-errors-vcovhc-for-lm-model-with-singularities