问题
I want to do single df orthogonal contrast in anova (fixed or mixed model). Here is just example:
require(nlme)
data (Alfalfa)
Variety: a factor with levels Cossack, Ladak, and Ranger
Date : a factor with levels None S1 S20 O7
Block: a factor with levels 1 2 3 4 5 6
Yield : a numeric vector
These data are described in Snedecor and Cochran (1980) as an example of a split-plot design. The treatment structure used in the experiment was a 3\times4 full factorial, with three varieties of alfalfa and four dates of third cutting in 1943. The experimental units were arranged into six blocks, each subdivided into four plots. The varieties of alfalfa (Cossac, Ladak, and Ranger) were assigned randomly to the blocks and the dates of third cutting (None, S1—September 1, S20—September 20, and O7—October 7) were randomly assigned to the plots. All four dates were used on each block.
model<-with (Alfalfa, aov(Yield~Variety*Date +Error(Block/Date/Variety)))
> summary(model)
Error: Block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 5 4.15 0.83
Error: Block:Date
Df Sum Sq Mean Sq F value Pr(>F)
Date 3 1.9625 0.6542 17.84 3.29e-05 ***
Residuals 15 0.5501 0.0367
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Block:Date:Variety
Df Sum Sq Mean Sq F value Pr(>F)
Variety 2 0.1780 0.08901 1.719 0.192
Variety:Date 6 0.2106 0.03509 0.678 0.668
Residuals 40 2.0708 0.05177
I want to perform some comparison (orthogonal contrasts within a group), for example for date, two contrasts:
(a) S1 vs others (S20 O7)
(b) S20 vs 07,
For variety factor two contrasts:
(c) Cossack vs others (Ladak and Ranger)
(d) Ladak vs Ranger
Thus the anova output would look like:
Error: Block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 5 4.15 0.83
Error: Block:Date
Df Sum Sq Mean Sq F value Pr(>F)
Date 3 1.9625 0.6542 17.84 3.29e-05 ***
(a) S1 vs others ? ?
(b) S20 vs 07 ? ?
Residuals 15 0.5501 0.0367
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Block:Date:Variety
Df Sum Sq Mean Sq F value Pr(>F)
Variety 2 0.1780 0.08901 1.719 0.192
(c) Cossack vs others ? ? ?
(d) Ladak vs Ranger ? ? ?
Variety:Date 6 0.2106 0.03509 0.678 0.668
Residuals 40 2.0708 0.05177
How can I perform this ? ....................
回答1:
First of all, why use ANOVA? You can use lme
from the nlme
package and in addition to the hypothesis tests aov
gives you, you also get interpretable estimates of the effect sizes and the directions of the effects. At any rate, two approaches come to mind:
- Specify contrasts on the variables manually, as explained here.
- Install the
multcomp
package and useglht
.
glht
is a little opinionated about models that are multivariate in their predictors. Long story short, though, if you were to create a diagonal matrix cm0
with the same dimensions and dimnames as the vcov
of your model (let's assume it's an lme
fit called model0
), then summary(glht(model0,linfct=cm0))
should give the same estimates, SEs, and test statistics as summary(model0)$tTable
(but incorrect p-values). Now, if you mess around with linear combinations of rows from cm0
and create new matrices with the same number of columns as cm0
but these linear combinations as rows, you'll eventually figure out the pattern to creating a matrix that will give you the intercept estimate for each cell (check it against predict(model0,level=0)
). Now, another matrix with differences between various rows of this matrix will give you corresponding between-group differences. The same approach but with numeric values set to 1 instead of 0 can be used to get the slope estimates for each cell. Then the differences between these slope estimates can be used to get between-group slope differences.
Three things to keep in mind:
- As I said the p-values are going to be wrong for models other than
lm
, (possibly, haven't tried)aov
, and certain survival models. This is becauseglht
assumes az
distribution instead of at
distribution by default (except forlm
). To get correct p-values, take the test statisticglht
calculates and manually do2*pt(abs(
STAT),df=
DF,lower=F)
to get the two-tailed p-value where STAT is the test statistic returned byglht
and DF is thedf
from the corresponding type of default contrast insummary(model0)$tTable
. - Your contrasts probably no longer test independent hypotheses, and multiple testing correction is necessary, if it wasn't already. Run the p-values through
p.adjust
. - This is my own distillation of a lot of handwaving from professors and colleagues, and a lot of reading of Crossvalidated and Stackoverflow on related topics. I could be wrong in multiple ways, and if I am, hopefully someone more knowlegeable will correct us both.
来源:https://stackoverflow.com/questions/13893444/partition-of-anova-and-comparisons-orthogonal-single-df-in-r