Get p-value for group mean difference without refitting linear model with a new reference level

后端 未结 3 1783
名媛妹妹
名媛妹妹 2021-01-25 06:42

When we have a linear model with a factor variable X (with levels A, B, and C)

y ~ factor(X) + Var2 + Var3 
<         


        
3条回答
  •  感情败类
    2021-01-25 07:25

    You are looking for linear hypothesis test by check p-value of some linear combination of regression coefficients. Based on my answer: How to conduct linear hypothesis test on regression coefficients with a clustered covariance matrix?, where we only considered sum of coefficients, I will extend the function LinearCombTest to handle more general cases, supposing alpha as some combination coefficients of variables in vars:

    LinearCombTest <- function (lmObject, vars, alpha, .vcov = NULL) {
      ## if `.vcov` missing, use the one returned by `lm`
      if (is.null(.vcov)) .vcov <- vcov(lmObject)
      ## estimated coefficients
      beta <- coef(lmObject)
      ## linear combination of `vars` with combination coefficients `alpha`
      LinearComb <- sum(beta[vars] * alpha)
      ## get standard errors for sum of `LinearComb`
      LinearComb_se <- sum(alpha * crossprod(.vcov[vars, vars], alpha)) ^ 0.5
      ## perform t-test on `sumvars`
      tscore <- LinearComb / LinearComb_se
      pvalue <- 2 * pt(abs(tscore), lmObject$df.residual, lower.tail = FALSE)
      ## return a matrix
      form <- paste0("(", paste(alpha, vars, sep = " * "), ")")
      form <- paste0(paste0(form, collapse = " + "), " = 0")
      matrix(c(LinearComb, LinearComb_se, tscore, pvalue), nrow = 1L,
             dimnames = list(form, c("Estimate", "Std. Error", "t value", "Pr(>|t|)")))
      }
    

    Consider a simple example, where we have a balanced design for three groups A, B and C, with group mean 0, 1, 2, respectively.

    x <- gl(3,100,labels = LETTERS[1:3])
    set.seed(0)
    y <- c(rnorm(100, 0), rnorm(100, 1), rnorm(100, 2)) + 0.1
    
    fit <- lm(y ~ x)
    coef(summary(fit))
    
    #             Estimate Std. Error   t value     Pr(>|t|)
    #(Intercept) 0.1226684 0.09692277  1.265631 2.066372e-01
    #xB          0.9317800 0.13706949  6.797866 5.823987e-11
    #xC          2.0445528 0.13706949 14.916177 6.141008e-38
    

    Since A is the reference level, xB is giving B - A while xC is giving C - A. Suppose we are now interested in the difference between group B and C, i.e., C - B, we can use

    LinearCombTest(fit, c("xC", "xB"), c(1, -1))
    
    #                         Estimate Std. Error  t value     Pr(>|t|)
    #(1 * xC) + (-1 * xB) = 0 1.112773  0.1370695 8.118312 1.270686e-14
    

    Note, this function is also handy to work out the group mean of B and C, that is (Intercept) + xB and (Intercept) + xC:

    LinearCombTest(fit, c("(Intercept)", "xB"), c(1, 1))
    
    #                                 Estimate Std. Error  t value     Pr(>|t|)
    #(1 * (Intercept)) + (1 * xB) = 0 1.054448 0.09692277 10.87926 2.007956e-23
    
    LinearCombTest(fit, c("(Intercept)", "xC"), c(1, 1))
    
    #                                 Estimate Std. Error  t value     Pr(>|t|)
    #(1 * (Intercept)) + (1 * xC) = 0 2.167221 0.09692277 22.36029 1.272811e-65
    

    Alternative solution with lsmeans

    Consider the above toy example again:

    library(lsmeans)
    lsmeans(fit, spec = "x", contr = "revpairwise")
    
    #$lsmeans
    # x    lsmean         SE  df    lower.CL  upper.CL
    # A 0.1226684 0.09692277 297 -0.06807396 0.3134109
    # B 1.0544484 0.09692277 297  0.86370603 1.2451909
    # C 2.1672213 0.09692277 297  1.97647888 2.3579637
    #
    #Confidence level used: 0.95 
    #
    #$contrasts
    # contrast estimate        SE  df t.ratio p.value
    # B - A    0.931780 0.1370695 297   6.798  <.0001
    # C - A    2.044553 0.1370695 297  14.916  <.0001
    # C - B    1.112773 0.1370695 297   8.118  <.0001
    #
    #P value adjustment: tukey method for comparing a family of 3 estimates
    

    The $lsmeans domain returns the marginal group mean, while $contrasts returns pairwise group mean difference, since we have used "revpairwise" contrast. Read p.32 of lsmeans for difference between "pairwise" and "revpairwise".

    Well this is certainly interesting, as we can compare with the result from LinearCombTest. We see that LinearCombTest is doing correctly.

提交回复
热议问题