How to remove a lower order parameter in a model when the higher order parameters remain?

后端 未结 2 640
余生分开走
余生分开走 2021-02-14 18:19

The problem: I cannot remove a lower order parameter (e.g., a main effects parameter) in a model as long as the higher order parameters (i.e., interactions) rem

2条回答
  •  暗喜
    暗喜 (楼主)
    2021-02-14 19:00

    Here's a sort of answer; there is no way that I know of to formulate this model directly by the formula ...

    Construct data as above:

    d <- data.frame(A = rep(c("a1", "a2"), each = 50),
                    B = c("b1", "b2"), value = rnorm(100))
    options(contrasts=c('contr.sum','contr.poly'))
    

    Confirm original finding that just subtracting the factor from the formula doesn't work:

    m1 <- lm(value ~ A * B, data = d)
    coef(m1)
    ## (Intercept)          A1          B1       A1:B1 
    ## -0.23766309  0.04651298 -0.13019317 -0.06421580 
    
    m2 <- update(m1, .~. - A)
    coef(m2)
    ## (Intercept)          B1      Bb1:A1      Bb2:A1 
    ## -0.23766309 -0.13019317 -0.01770282  0.11072877 
    

    Formulate the new model matrix:

    X0 <- model.matrix(m1)
    ## drop Intercept column *and* A from model matrix
    X1 <- X0[,!colnames(X0) %in% "A1"]
    

    lm.fit allows direct specification of the model matrix:

    m3 <- lm.fit(x=X1,y=d$value)
    coef(m3)
    ## (Intercept)          B1       A1:B1 
    ## -0.2376631  -0.1301932  -0.0642158 
    

    This method only works for a few special cases that allow the model matrix to be specified explicitly (e.g. lm.fit, glm.fit).

    More generally:

    ## need to drop intercept column (or use -1 in the formula)
    X1 <- X1[,!colnames(X1) %in% "(Intercept)"]
    ## : will confuse things -- substitute something inert
    colnames(X1) <- gsub(":","_int_",colnames(X1))
    newf <- reformulate(colnames(X1),response="value")
    m4 <- lm(newf,data=data.frame(value=d$value,X1))
    coef(m4)
    ## (Intercept)          B1   A1_int_B1 
    ##  -0.2376631  -0.1301932  -0.0642158 
    

    This approach has the disadvantage that it won't recognize multiple input variables as stemming from the same predictor (i.e., multiple factor levels from a more-than-2-level factor).

提交回复
热议问题