Does Quasi Separation matter in R binomial GLM?

自古美人都是妖i 提交于 2019-12-02 17:16:32

问题


I am learning how the quasi-separation affects R binomial GLM. And I start to think that it does not matter in some circumstance.

In my understanding, we say that the data has quasi separation when some linear combination of factor levels can completely identify failure/non-failure.

So I created an artificial dataset with a quasi separation in R as:

fail <- c(100,100,100,100)
nofail <- c(100,100,0,100)
x1 <- c(1,0,1,0)
x2 <- c(0,0,1,1)
data <- data.frame(fail,nofail,x1,x2)
rownames(data) <- paste("obs",1:4)

Then when x1=1 and x2=1 (obs 3) the data always doesn't fail. In this data, my covariate matrix has three columns: intercept, x1 and x2.

In my understanding, quasi-separation results in estimate of infinite value. So glm fit should fail. However, the following glm fit does NOT fail:

summary(glm(cbind(fail,nofail)~x1+x2,data=data,family=binomial))

The result is:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.4342     0.1318  -3.294 0.000986 ***
x1            0.8684     0.1660   5.231 1.69e-07 ***
x2            0.8684     0.1660   5.231 1.69e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Std. Error seems very reasonable even with the quasi separation. Could anyone tell me why the quasi separation is NOT affecting the glm fit result?


回答1:


You have constructed an interesting example but you are not testing a model that actually examines the situation that you are describing as quasi-separation. When you say: "when x1=1 and x2=1 (obs 3) the data always fails.", you are implying the need for an interaction term in the model. Notice that this produces a "more interesting" result:

> summary(glm(cbind(fail,nofail)~x1*x2,data=data,family=binomial))

Call:
glm(formula = cbind(fail, nofail) ~ x1 * x2, family = binomial, 
    data = data)

Deviance Residuals: 
[1]  0  0  0  0

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.367e-17  1.414e-01   0.000        1
x1           2.675e-17  2.000e-01   0.000        1
x2           2.965e-17  2.000e-01   0.000        1
x1:x2        2.731e+01  5.169e+04   0.001        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1.2429e+02  on 3  degrees of freedom
Residual deviance: 2.7538e-10  on 0  degrees of freedom
AIC: 25.257

Number of Fisher Scoring iterations: 22

One generally needs to be very suspect of beta coefficients of 2.731e+01: The implicit odds ratio i:

 > exp(2.731e+01)
[1] 725407933166

In this working environment there really is no material difference between Inf and 725,407,933,166.



来源:https://stackoverflow.com/questions/37558180/does-quasi-separation-matter-in-r-binomial-glm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!