Get p-value for group mean difference without refitting linear model with a new reference level

后端 未结 3 1780
名媛妹妹
名媛妹妹 2021-01-25 06:42

When we have a linear model with a factor variable X (with levels A, B, and C)

y ~ factor(X) + Var2 + Var3 
<         


        
相关标签:
3条回答
  • 2021-01-25 07:15

    glht (general linear hypothesis testing) from multcomp package makes this sort of multiple hypothesis test easy without re-running a bunch of separate models. It is essentially crafting a customized contrast matrix based on your defined comparisons of interest.

    Using your example comparisons and building on the data @ZheyuanLi provided:

    x <- gl(3,100,labels = LETTERS[1:3])
    set.seed(0)
    y <- c(rnorm(100, 0), rnorm(100, 1), rnorm(100, 2)) + 0.1
    
    fit <- lm(y ~ x)
    
    library(multcomp)
    my_ht <- glht(fit, linfct = mcp(x = c("B-A = 0",
                                 "C-A = 0",
                                 "C-B = 0")))
    

    summary(my_ht) will give you the adjusted p-values for the comparisons of interest.

    #Linear Hypotheses:
    #           Estimate Std. Error t value Pr(>|t|)    
    #B - A == 0   0.9318     0.1371   6.798 1.11e-10 ***
    #C - A == 0   2.0446     0.1371  14.916  < 1e-10 ***
    #C - B == 0   1.1128     0.1371   8.118  < 1e-10 ***
    
    0 讨论(0)
  • 2021-01-25 07:17

    You could use the library car, and use the function linearHypothesis with the parameter vcov.

    Set this as the variance-covariance matrix of your model.

    The function takes formulas or a matrix to describe the system of equations that you would like to test.

    0 讨论(0)
  • 2021-01-25 07:25

    You are looking for linear hypothesis test by check p-value of some linear combination of regression coefficients. Based on my answer: How to conduct linear hypothesis test on regression coefficients with a clustered covariance matrix?, where we only considered sum of coefficients, I will extend the function LinearCombTest to handle more general cases, supposing alpha as some combination coefficients of variables in vars:

    LinearCombTest <- function (lmObject, vars, alpha, .vcov = NULL) {
      ## if `.vcov` missing, use the one returned by `lm`
      if (is.null(.vcov)) .vcov <- vcov(lmObject)
      ## estimated coefficients
      beta <- coef(lmObject)
      ## linear combination of `vars` with combination coefficients `alpha`
      LinearComb <- sum(beta[vars] * alpha)
      ## get standard errors for sum of `LinearComb`
      LinearComb_se <- sum(alpha * crossprod(.vcov[vars, vars], alpha)) ^ 0.5
      ## perform t-test on `sumvars`
      tscore <- LinearComb / LinearComb_se
      pvalue <- 2 * pt(abs(tscore), lmObject$df.residual, lower.tail = FALSE)
      ## return a matrix
      form <- paste0("(", paste(alpha, vars, sep = " * "), ")")
      form <- paste0(paste0(form, collapse = " + "), " = 0")
      matrix(c(LinearComb, LinearComb_se, tscore, pvalue), nrow = 1L,
             dimnames = list(form, c("Estimate", "Std. Error", "t value", "Pr(>|t|)")))
      }
    

    Consider a simple example, where we have a balanced design for three groups A, B and C, with group mean 0, 1, 2, respectively.

    x <- gl(3,100,labels = LETTERS[1:3])
    set.seed(0)
    y <- c(rnorm(100, 0), rnorm(100, 1), rnorm(100, 2)) + 0.1
    
    fit <- lm(y ~ x)
    coef(summary(fit))
    
    #             Estimate Std. Error   t value     Pr(>|t|)
    #(Intercept) 0.1226684 0.09692277  1.265631 2.066372e-01
    #xB          0.9317800 0.13706949  6.797866 5.823987e-11
    #xC          2.0445528 0.13706949 14.916177 6.141008e-38
    

    Since A is the reference level, xB is giving B - A while xC is giving C - A. Suppose we are now interested in the difference between group B and C, i.e., C - B, we can use

    LinearCombTest(fit, c("xC", "xB"), c(1, -1))
    
    #                         Estimate Std. Error  t value     Pr(>|t|)
    #(1 * xC) + (-1 * xB) = 0 1.112773  0.1370695 8.118312 1.270686e-14
    

    Note, this function is also handy to work out the group mean of B and C, that is (Intercept) + xB and (Intercept) + xC:

    LinearCombTest(fit, c("(Intercept)", "xB"), c(1, 1))
    
    #                                 Estimate Std. Error  t value     Pr(>|t|)
    #(1 * (Intercept)) + (1 * xB) = 0 1.054448 0.09692277 10.87926 2.007956e-23
    
    LinearCombTest(fit, c("(Intercept)", "xC"), c(1, 1))
    
    #                                 Estimate Std. Error  t value     Pr(>|t|)
    #(1 * (Intercept)) + (1 * xC) = 0 2.167221 0.09692277 22.36029 1.272811e-65
    

    Alternative solution with lsmeans

    Consider the above toy example again:

    library(lsmeans)
    lsmeans(fit, spec = "x", contr = "revpairwise")
    
    #$lsmeans
    # x    lsmean         SE  df    lower.CL  upper.CL
    # A 0.1226684 0.09692277 297 -0.06807396 0.3134109
    # B 1.0544484 0.09692277 297  0.86370603 1.2451909
    # C 2.1672213 0.09692277 297  1.97647888 2.3579637
    #
    #Confidence level used: 0.95 
    #
    #$contrasts
    # contrast estimate        SE  df t.ratio p.value
    # B - A    0.931780 0.1370695 297   6.798  <.0001
    # C - A    2.044553 0.1370695 297  14.916  <.0001
    # C - B    1.112773 0.1370695 297   8.118  <.0001
    #
    #P value adjustment: tukey method for comparing a family of 3 estimates
    

    The $lsmeans domain returns the marginal group mean, while $contrasts returns pairwise group mean difference, since we have used "revpairwise" contrast. Read p.32 of lsmeans for difference between "pairwise" and "revpairwise".

    Well this is certainly interesting, as we can compare with the result from LinearCombTest. We see that LinearCombTest is doing correctly.

    0 讨论(0)
提交回复
热议问题