partition of anova and comparisons (orthogonal single df) in r

问题

I want to do single df orthogonal contrast in anova (fixed or mixed model). Here is just example:

require(nlme)
data (Alfalfa)
  Variety: a factor with levels Cossack, Ladak, and Ranger
  Date : a factor with levels None S1 S20 O7
  Block: a factor with levels 1 2 3 4 5 6
  Yield : a numeric vector

These data are described in Snedecor and Cochran (1980) as an example of a split-plot design. The treatment structure used in the experiment was a 3\times4 full factorial, with three varieties of alfalfa and four dates of third cutting in 1943. The experimental units were arranged into six blocks, each subdivided into four plots. The varieties of alfalfa (Cossac, Ladak, and Ranger) were assigned randomly to the blocks and the dates of third cutting (None, S1—September 1, S20—September 20, and O7—October 7) were randomly assigned to the plots. All four dates were used on each block.

model<-with (Alfalfa, aov(Yield~Variety*Date +Error(Block/Date/Variety)))

    > summary(model)

Error: Block
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  5   4.15    0.83

Error: Block:Date
          Df Sum Sq Mean Sq F value   Pr(>F)
Date       3 1.9625  0.6542   17.84 3.29e-05 ***
Residuals 15 0.5501  0.0367
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Block:Date:Variety
             Df Sum Sq Mean Sq F value Pr(>F)
Variety       2 0.1780 0.08901   1.719  0.192
Variety:Date  6 0.2106 0.03509   0.678  0.668
Residuals    40 2.0708 0.05177

I want to perform some comparison (orthogonal contrasts within a group), for example for date, two contrasts:

   (a) S1 vs others (S20 O7)
   (b) S20 vs 07,

For variety factor two contrasts:

  (c)  Cossack vs others (Ladak and Ranger)
   (d) Ladak vs Ranger

Thus the anova output would look like:

Error: Block
              Df Sum Sq Mean Sq F value Pr(>F)
Residuals  5   4.15    0.83

Error: Block:Date
          Df Sum Sq Mean Sq F value   Pr(>F)
Date       3 1.9625  0.6542   17.84 3.29e-05 ***
       (a) S1 vs others    ?        ?
       (b)  S20 vs 07       ?        ?
Residuals 15 0.5501  0.0367
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Block:Date:Variety
             Df Sum Sq Mean Sq F value Pr(>F)
Variety       2 0.1780 0.08901   1.719  0.192
     (c)  Cossack vs others ?        ?    ?
     (d)  Ladak vs Ranger    ?       ?     ?
Variety:Date  6 0.2106 0.03509   0.678  0.668
Residuals    40 2.0708 0.05177

How can I perform this ? ....................

回答1:

First of all, why use ANOVA? You can use lme from the nlme package and in addition to the hypothesis tests aov gives you, you also get interpretable estimates of the effect sizes and the directions of the effects. At any rate, two approaches come to mind:

Specify contrasts on the variables manually, as explained here.
Install the multcomp package and use glht.

glht is a little opinionated about models that are multivariate in their predictors. Long story short, though, if you were to create a diagonal matrix cm0 with the same dimensions and dimnames as the vcov of your model (let's assume it's an lme fit called model0), then summary(glht(model0,linfct=cm0)) should give the same estimates, SEs, and test statistics as summary(model0)$tTable (but incorrect p-values). Now, if you mess around with linear combinations of rows from cm0 and create new matrices with the same number of columns as cm0 but these linear combinations as rows, you'll eventually figure out the pattern to creating a matrix that will give you the intercept estimate for each cell (check it against predict(model0,level=0)). Now, another matrix with differences between various rows of this matrix will give you corresponding between-group differences. The same approach but with numeric values set to 1 instead of 0 can be used to get the slope estimates for each cell. Then the differences between these slope estimates can be used to get between-group slope differences.

Three things to keep in mind:

As I said the p-values are going to be wrong for models other than lm, (possibly, haven't tried) aov, and certain survival models. This is because glht assumes a z distribution instead of a t distribution by default (except for lm). To get correct p-values, take the test statistic glht calculates and manually do 2*pt(abs(STAT),df=DF,lower=F) to get the two-tailed p-value where STAT is the test statistic returned by glht and DF is the df from the corresponding type of default contrast in summary(model0)$tTable.
Your contrasts probably no longer test independent hypotheses, and multiple testing correction is necessary, if it wasn't already. Run the p-values through p.adjust.
This is my own distillation of a lot of handwaving from professors and colleagues, and a lot of reading of Crossvalidated and Stackoverflow on related topics. I could be wrong in multiple ways, and if I am, hopefully someone more knowlegeable will correct us both.

来源：https://stackoverflow.com/questions/13893444/partition-of-anova-and-comparisons-orthogonal-single-df-in-r

标签

anova

contrast

orthogonal