R calculate robust standard errors (vcovHC) for lm model with singularities

爷,独闯天下 提交于 2019-12-08 17:30:39

问题


In R, how can I calculate robust standard errors using vcovHC() when some coefficients are dropped due to singularities? The standard lm function seems to do fine calculating normal standard errors for all coefficients that are actually estimated, but vcovHC() throws an error: "Error in bread. %*% meat. : non-conformable arguments".

(The actual data I'm using is a bit more complicated. In fact, it is a model using two different fixed effects and I run into local singularities which I cannot simply get rid of. At least I would not know how. For the two fixed effects I'm using the first factor has 150 levels, the second has 142 levels and there are in total 9 singularities which result from the fact that the data was collected in ten blocks.)

Here is my output:

Call:
lm(formula = one ~ two + three + Jan + Feb + Mar + Apr + May + 
Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-130.12  -60.95    0.08   61.05  137.35 

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1169.74313   57.36807  20.390   <2e-16 ***
two           -0.07963    0.06720  -1.185    0.237    
three         -0.04053    0.06686  -0.606    0.545    
Jan            8.10336   22.05552   0.367    0.714    
Feb            0.44025   22.11275   0.020    0.984    
Mar           19.65066   22.02454   0.892    0.373    
Apr          -13.19779   22.02886  -0.599    0.550    
May           15.39534   22.10445   0.696    0.487    
Jun          -12.50227   22.07013  -0.566    0.572    
Jul          -20.58648   22.06772  -0.933    0.352    
Aug           -0.72223   22.36923  -0.032    0.974    
Sep           12.42204   22.09296   0.562    0.574    
Oct           25.14836   22.04324   1.141    0.255    
Nov           18.13337   22.08717   0.821    0.413    
Dec                 NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 69.63 on 226 degrees of freedom
Multiple R-squared: 0.04878,    Adjusted R-squared: -0.005939 
F-statistic: 0.8914 on 13 and 226 DF,  p-value: 0.5629 

> model$se <- vcovHC(model)
Error in bread. %*% meat. : non-conformable arguments

Here is a minimal code snipped to reproduce the error.

library(sandwich)
set.seed(101)
dat<-data.frame(one=c(sample(1000:1239)),
              two=c(sample(200:439)),
              three=c(sample(600:839)),
              Jan=c(rep(1,20),rep(0,220)),
              Feb=c(rep(0,20),rep(1,20),rep(0,200)),
              Mar=c(rep(0,40),rep(1,20),rep(0,180)),
              Apr=c(rep(0,60),rep(1,20),rep(0,160)),
              May=c(rep(0,80),rep(1,20),rep(0,140)),
              Jun=c(rep(0,100),rep(1,20),rep(0,120)),
              Jul=c(rep(0,120),rep(1,20),rep(0,100)),
              Aug=c(rep(0,140),rep(1,20),rep(0,80)),
              Sep=c(rep(0,160),rep(1,20),rep(0,60)),
              Oct=c(rep(0,180),rep(1,20),rep(0,40)),
              Nov=c(rep(0,200),rep(1,20),rep(0,20)),
              Dec=c(rep(0,220),rep(1,20))) 
model <- lm(one ~ two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
summary(model)
model$se <- vcovHC(model)

回答1:


Models with singularities are never good and they should be fixed. In your case, you have 12 coefficients for 12 month, but also the global intercept! So you have actually 13 coefficients for only 12 real parameters to be estimated. What you actually want is to disable the global intercept - so you will have something more like month-specific intercept:

> model <- lm(one ~ 0 + two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
> summary(model)

Call:
lm(formula = one ~ 0 + two + three + Jan + Feb + Mar + Apr + 
    May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-133.817  -55.636    3.329   56.768  126.772 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
two     -0.09670    0.06621  -1.460    0.146    
three    0.02446    0.06666   0.367    0.714    
Jan   1130.05812   52.79625  21.404   <2e-16 ***
Feb   1121.32904   55.18864  20.318   <2e-16 ***
Mar   1143.50310   53.59603  21.336   <2e-16 ***
Apr   1143.95365   54.99724  20.800   <2e-16 ***
May   1136.36429   53.38218  21.287   <2e-16 ***
Jun   1129.86010   53.85865  20.978   <2e-16 ***
Jul   1105.10045   54.94940  20.111   <2e-16 ***
Aug   1147.47152   54.57201  21.027   <2e-16 ***
Sep   1139.42205   53.58611  21.263   <2e-16 ***
Oct   1117.75075   55.35703  20.192   <2e-16 ***
Nov   1129.20208   53.54934  21.087   <2e-16 ***
Dec   1149.55556   53.52499  21.477   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared:  0.9964,    Adjusted R-squared:  0.9961 
F-statistic:  4409 on 14 and 226 DF,  p-value: < 2.2e-16

Then, it is a normal model so you shouldn't have any problems with vcovHC.




回答2:


What you seem to be aiming at is a fixed effects estimation, though this question was raised a while ago I ran into the same problem, here is my solution: Fixed effects can be controlled for by including a + factor() in your estimation equation:

So I created an additional column first:

# create an addtitional column in your data 
dat$month <- "0"
#this column will contain the month, not a dummy for months
for	(i in 1:length(dat$one)){
	if (dat[i,"Jan"]==1){
	dat[i,"month"]<- "Jan"}
	if (dat[i,"Feb"]==1){
	dat[i,"month"]<- "Feb"}
	if (dat[i,"Mar"]==1){
	dat[i,"month"]<- "Mar"}
	if (dat[i,"Apr"]==1){
	dat[i,"month"]<- "Apr"}
	if (dat[i,"May"]==1){
	dat[i,"month"]<- "May"}
	if (dat[i,"Jun"]==1){
	dat[i,"month"]<- "Jun"}
	if (dat[i,"Jul"]==1){
	dat[i,"month"]<- "Jul"}
	if (dat[i,"Aug"]==1){
	dat[i,"month"]<- "Aug"}
	if (dat[i,"Sep"]==1){
	dat[i,"month"]<- "Sep"}
	if (dat[i,"Oct"]==1){
	dat[i,"month"]<- "Oct"}
	if (dat[i,"Nov"]==1){
	dat[i,"month"]<- "Nov"}
	if (dat[i,"Dec"]==1){
	dat[i,"month"]<- "Dec"}
}
i <- NULL

This column now can be used as the fixed or constant effect factor in the regression equation:

> #you can use the created column as fixed effect factor in your 
+ regression 
> model_A <- lm(one ~ two + three + factor(month), data=dat)
> summary(model_A)

Call:
lm(formula = one ~ two + three + factor(month), data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-133.817  -55.636    3.329   56.768  126.772 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)      1143.95365   54.99724  20.800   <2e-16 ***
two                -0.09670    0.06621  -1.460   0.1455    
three               0.02446    0.06666   0.367   0.7141    
factor(month)Aug    3.51788   22.09948   0.159   0.8737    
factor(month)Dec    5.60192   22.41204   0.250   0.8029    
factor(month)Feb  -22.62460   22.10889  -1.023   0.3072    
factor(month)Jan  -13.89553   22.25117  -0.624   0.5329    
factor(month)Jul  -38.85320   22.13980  -1.755   0.0806 .  
factor(month)Jun  -14.09355   22.18707  -0.635   0.5259    
factor(month)Mar   -0.45055   22.13638  -0.020   0.9838    
factor(month)May   -7.58935   22.14137  -0.343   0.7321    
factor(month)Nov  -14.75156   22.27288  -0.662   0.5084    
factor(month)Oct  -26.20290   22.09416  -1.186   0.2369    
factor(month)Sep   -4.53159   22.26334  -0.204   0.8389    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared:  0.04381,   Adjusted R-squared:  -0.01119 
F-statistic: 0.7966 on 13 and 226 DF,  p-value: 0.6635

> #and also do the same without intercept if so needed
> model_B <- lm(one ~ 0 + two + three + factor(month), data=dat)
> summary(model_B)

Call:
lm(formula = one ~ 0 + two + three + factor(month), data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-133.817  -55.636    3.329   56.768  126.772 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
two                -0.09670    0.06621  -1.460    0.146    
three               0.02446    0.06666   0.367    0.714    
factor(month)Apr 1143.95365   54.99724  20.800   <2e-16 ***
factor(month)Aug 1147.47152   54.57201  21.027   <2e-16 ***
factor(month)Dec 1149.55556   53.52499  21.477   <2e-16 ***
factor(month)Feb 1121.32904   55.18864  20.318   <2e-16 ***
factor(month)Jan 1130.05812   52.79625  21.404   <2e-16 ***
factor(month)Jul 1105.10045   54.94940  20.111   <2e-16 ***
factor(month)Jun 1129.86010   53.85865  20.978   <2e-16 ***
factor(month)Mar 1143.50310   53.59603  21.336   <2e-16 ***
factor(month)May 1136.36429   53.38218  21.287   <2e-16 ***
factor(month)Nov 1129.20208   53.54934  21.087   <2e-16 ***
factor(month)Oct 1117.75075   55.35703  20.192   <2e-16 ***
factor(month)Sep 1139.42205   53.58611  21.263   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared:  0.9964,    Adjusted R-squared:  0.9961 
F-statistic:  4409 on 14 and 226 DF,  p-value: < 2.2e-16

This lets you run a regular OLS regression on panel data.



来源:https://stackoverflow.com/questions/9335621/r-calculate-robust-standard-errors-vcovhc-for-lm-model-with-singularities

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!