问题
For my experiment, I clipped plants and measured their responses, such as leaf mass produced, at the end of the season. I manipulated both clipping intensity and clipping time and crossed these two treatments. I also included a control clipped treatment resulting in 5 different clipping treatment combinations. With 12 plants per treatment there is a total of 60 plants which I followed over the course of two years. That is, I collected measurements on these 60 plants in year 1 and the same plants again in year 2.
It would be simplest to just analyze the 5 different treatments separately. However, I would like to obtain the effects of timing and intensity and their interactions, but because of the control treatment which is not fully crossed with either timing or intensity, this makes my experimental design unbalanced and statistically tricky. To complicate this a bit more, I would like to include the effect of year into my model as well.
Ideally, I would be able to do this using lme4 which makes multiple comparison a breeze afterwards with the lsmeans package.
When I try to run my model
m1<-lmer(log(plant.leaf.g+1)~timing*intensity*year+(1|id), data=cmv) #not significant
I am met with the warning "fixed-effect model matrix is rank deficient so dropping 8 columns / coefficients".
Does anyone know of a way I can get this unbalanced mixed model to work with lme4?
Here is a subset of my data where "never" under timing and "zero" under intensity arbitrarily replaced "control" treatment:
id year timing intensity treatment plant.leaf.g
91 2015 early low early-low 315.944
92 2015 never zero control 99.28
93 2015 late high late-high 663.936
94 2015 early low early-low 25.488
95 2015 early high early-high 453.57
96 2015 late low late-low 90.804
97 2015 never zero control 1312.098
98 2015 late high late-high 959.82
99 2015 late low late-low 28.014
100 2015 late high late-high 178.56
91 2014 early low early-low 289.14
92 2014 never zero control 61.774
93 2014 late high late-high 639.936
94 2014 early low early-low 138.39
95 2014 early high early-high 168.216
96 2014 late low late-low 51.008
97 2014 never zero control 966.112
98 2014 late high late-high 279.048
99 2014 late low late-low 23.936
100 2014 late high late-high 169.344
cmv<-structure(list(id = c(91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L,
99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L, 110L,
91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L,
103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L), year = c(2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L), timing = structure(c(1L, 3L, 2L, 1L, 1L, 2L, 3L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 3L, 2L, 1L, 3L, 2L, 1L,
1L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 1L, 3L, 2L
), .Label = c("early", "late", "never"), class = "factor"), intensity = structure(c(2L,
3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 3L, 2L, 1L,
3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 2L,
2L, 3L, 2L, 1L, 3L, 1L), .Label = c("high", "low", "zero"), class = "factor"),
treatment = structure(c(3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L,
4L, 5L, 2L, 2L, 5L, 1L, 3L, 2L, 1L, 4L, 3L, 1L, 4L, 3L, 2L,
5L, 1L, 4L, 5L, 4L, 5L, 2L, 2L, 5L, 5L, 1L, 3L, 2L, 1L, 4L
), .Label = c("control", "early-high", "early-low", "late-high",
"late-low"), class = "factor"), plant.stem.g = c(315.944,
99.28, 663.936, 25.488, 453.57, 90.804, 1312.098, 959.82,
28.014, 178.56, 158.12, 387.528, 288.75, 327.348, 770.44,
835.05, 457.188, 942.002, 229.194, 289.14, 61.774, 639.936,
138.39, 168.216, 51.008, 966.112, 279.048, 23.936, 169.344,
154.14, 703.04, 836.4, 511.92, 463.524, 245.226, 267.41,
439.392, 714.85, 68.012)), .Names = c("id", "year", "timing",
"intensity", "treatment", "plant.stem.g"), class = "data.frame", row.names = c(NA,
-39L))
Note: I have gotten m1=aov(plant.leaf.g~intensity*timing*year+Error(id), data=cmv)
to run, but I read that I should use Anova type="3" function from the car
package to obtain my p-values, but I haven't been able to do this with the Error(id) term. Nor have I been able do a multiple comparison with TukeyHSD
function or multcomp
package.
回答1:
There is nothing intrinsically wrong with
m1<-lmer(log(plant.leaf.g+1)~timing*intensity*year+(1|id),
data=cmv)
(except that log-transforming data with zeros in it is tricky; are you sure that adding 1 is correct? It only makes perfect sense if leaf mass is unitless. You might consider adding min(plant.leaf.g[plant.leaf.g>0])/2
instead ...)
The warning (not an error) occurs because you don't have all combinations of timing, intensity, and year in your data set, but you are asking R to estimate parameters for every combination. A few reasonable choices are:
- ignore the warning (you'll probably get reasonable answers anyway when comparing the overall effect of each factor)
- reduce the complexity of the model, in particular by eliminating the 3-way interaction (i.e. use
(timing+intensity+year)^2
) (I'm assuming this will work, but you might need to simplify the model still further if e.g. there are combinations of timing and intensity that are missing from your data) - construct a one-way ANOVA from the 3-way interaction, e.g.
cmv$int <- with(cmv,interaction(timing,intensity,year,drop=TRUE))
(but then you won't be able to separate main effects and interactions)
来源:https://stackoverflow.com/questions/39135939/r-analyzing-repeated-measures-unbalanced-design-with-lme4