How can I force dropping intercept or equivalent in this linear model?

问题

Consider the following table :

DB <- data.frame(
  Y =rnorm(6),
  X1=c(T, T, F, T, F, F),
  X2=c(T, F, T, F, T, T)
)
           Y    X1    X2
1  1.8376852  TRUE  TRUE
2 -2.1173739  TRUE FALSE
3  1.3054450 FALSE  TRUE
4 -0.3476706  TRUE FALSE
5  1.3219099 FALSE  TRUE
6  0.6781750 FALSE  TRUE

I'd like to explain my quantitative variable Y by two binary variables (TRUE or FALSE) without intercept.

The argument of this choice is that, in my study, we can't observe X1=FALSE and X2=FALSE at the same time, so it doesn't make sense to have a mean, other than 0, for this level.

With intercept

m1 <- lm(Y~X1+X2, data=DB)
summary(m1)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -1.9684     1.0590  -1.859   0.1600  
X1TRUE        0.7358     0.9032   0.815   0.4749  
X2TRUE        3.0702     0.9579   3.205   0.0491 *

Without intercept

m0 <- lm(Y~0+X1+X2, data=DB)
summary(m0)

Coefficients:
        Estimate Std. Error t value Pr(>|t|)  
X1FALSE  -1.9684     1.0590  -1.859   0.1600  
X1TRUE   -1.2325     0.5531  -2.229   0.1122  
X2TRUE    3.0702     0.9579   3.205   0.0491 *

I can't explain why two coefficients are estimated for the variable X1. It seems to be equivalent to the intercept coefficient in the model with intercept.

Same results

When we display the estimation for all the combinations of variables, the two models are the same.

DisplayLevel <- function(m){
  R <-  outer(
    unique(DB$X1),
    unique(DB$X2),
    function(a, b) predict(m,data.frame(X1=a, X2=b))
  )
  colnames(R) <- paste0('X2:', unique(DB$X2))
  rownames(R) <- paste0('X1:', unique(DB$X1))
  return(R)
}

DisplayLevel(m1)
          X2:TRUE  X2:FALSE
X1:TRUE  1.837685 -1.232522
X1:FALSE 1.101843 -1.968364

DisplayLevel(m0)
          X2:TRUE  X2:FALSE
X1:TRUE  1.837685 -1.232522
X1:FALSE 1.101843 -1.968364

So the two models are equivalent.

Question

My question is : can we just estimate one coefficient for the first effect ? Can we force R to assign a 0 value to the combinations X1=FALSE and X2=FALSE ?

回答1:

Yes, we can, by

DB <- as.data.frame(data.matrix(DB))
## or you can do:
## DB$X1 <- as.integer(DB$X1)
## DB$X2 <- as.integer(DB$X2)

#            Y X1 X2
# 1 -0.5059575  1  1
# 2  1.3430388  1  0
# 3 -0.2145794  0  1
# 4 -0.1795565  1  0
# 5 -0.1001907  0  1
# 6  0.7126663  0  1

## a linear model without intercept
m0 <- lm(Y ~ 0 + X1 + X2, data = DB)

DisplayLevel(m0)
#             X2:1      X2:0
# X1:1  0.15967744 0.2489237
# X1:0 -0.08924625 0.0000000

I have explicitly coerced your TRUE/FALSE binary into numeric 1/0, so that no contrast is handled by lm().

The data appeared in my answer are different to yours, because you did not use set.seed(?) before rnorm() for reproducibility. But this is not a issue here.

来源：https://stackoverflow.com/questions/38129669/how-can-i-force-dropping-intercept-or-equivalent-in-this-linear-model

标签

regression

linear-regression

anova