Different Robust Standard Errors of Logit Regression in Stata and R

前端 未结 1 829
你的背包
你的背包 2020-12-05 03:21

I am trying to replicate a logit regression from Stata to R. In Stata I use the option \"robust\" to have the robust standard error (heteroscedasticity-consistent standard e

相关标签:
1条回答
  • 2020-12-05 04:08

    The default so-called "robust" standard errors in Stata correspond to what sandwich() from the package of the same name computes. The only difference is how the finite-sample adjustment is done. In the sandwich(...) function no finite-sample adjustment is done at all by default, i.e., the sandwich is divided by 1/n where n is the number of observations. Alternatively, sandwich(..., adjust = TRUE) can be used which divides by 1/(n - k) where k is the number of regressors. And Stata divides by 1/(n - 1).

    Of course, asymptotically these do not differ at all. And except for a few special cases (e.g., OLS linear regression) there is no argument for 1/(n - k) or 1/(n - 1) to work "correctly" in finite samples (e.g., unbiasedness). At least not to the best of my knowledge.

    So to obtain the same results as in Stata you can do do:

    sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
    coeftest(myfit, vcov = sandwich1)
    

    This yields

    z test of coefficients:
    
                  Estimate Std. Error z value  Pr(>|z|)    
    (Intercept) -3.9899791  1.1380890 -3.5059 0.0004551 ***
    gre          0.0022644  0.0011027  2.0536 0.0400192 *  
    gpa          0.8040375  0.3451359  2.3296 0.0198259 *  
    rank2       -0.6754429  0.3144686 -2.1479 0.0317228 *  
    rank3       -1.3402039  0.3445257 -3.8900 0.0001002 ***
    rank4       -1.5514637  0.4160544 -3.7290 0.0001922 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    

    And just for the record: In the binary response case, these "robust" standard errors are not robust against anything. Provided that the model is correctly specified, they are consistent and it's ok to use them but they don't guard against any misspecification in the model. Because the basic assumption for the sandwich standard errors to work is that the model equation (or more precisely the corresponding score function) is correctly specified while the rest of the model may be misspecified. However, in a binary regression there is no room for misspecification because the model equation just consists of the mean (= probability) and the likelihood is the mean and 1 - mean, respectively. This is in contrast to linear or count data regression where there may be heteroskedasticity, overdispersion, etc.

    0 讨论(0)
提交回复
热议问题