The problem: I cannot remove a lower order parameter (e.g., a main effects parameter) in a model as long as the higher order parameters (i.e., interactions) rem
Here's a sort of answer; there is no way that I know of to formulate this model directly by the formula ...
Construct data as above:
d <- data.frame(A = rep(c("a1", "a2"), each = 50),
B = c("b1", "b2"), value = rnorm(100))
options(contrasts=c('contr.sum','contr.poly'))
Confirm original finding that just subtracting the factor from the formula doesn't work:
m1 <- lm(value ~ A * B, data = d)
coef(m1)
## (Intercept) A1 B1 A1:B1
## -0.23766309 0.04651298 -0.13019317 -0.06421580
m2 <- update(m1, .~. - A)
coef(m2)
## (Intercept) B1 Bb1:A1 Bb2:A1
## -0.23766309 -0.13019317 -0.01770282 0.11072877
Formulate the new model matrix:
X0 <- model.matrix(m1)
## drop Intercept column *and* A from model matrix
X1 <- X0[,!colnames(X0) %in% "A1"]
lm.fit
allows direct specification of the model matrix:
m3 <- lm.fit(x=X1,y=d$value)
coef(m3)
## (Intercept) B1 A1:B1
## -0.2376631 -0.1301932 -0.0642158
This method only works for a few special cases that allow the model matrix to be specified explicitly (e.g. lm.fit
, glm.fit
).
More generally:
## need to drop intercept column (or use -1 in the formula)
X1 <- X1[,!colnames(X1) %in% "(Intercept)"]
## : will confuse things -- substitute something inert
colnames(X1) <- gsub(":","_int_",colnames(X1))
newf <- reformulate(colnames(X1),response="value")
m4 <- lm(newf,data=data.frame(value=d$value,X1))
coef(m4)
## (Intercept) B1 A1_int_B1
## -0.2376631 -0.1301932 -0.0642158
This approach has the disadvantage that it won't recognize multiple input variables as stemming from the same predictor (i.e., multiple factor levels from a more-than-2-level factor).