R- analyzing repeated measures unbalanced design with lme4?

问题

For my experiment, I clipped plants and measured their responses, such as leaf mass produced, at the end of the season. I manipulated both clipping intensity and clipping time and crossed these two treatments. I also included a control clipped treatment resulting in 5 different clipping treatment combinations. With 12 plants per treatment there is a total of 60 plants which I followed over the course of two years. That is, I collected measurements on these 60 plants in year 1 and the same plants again in year 2.

It would be simplest to just analyze the 5 different treatments separately. However, I would like to obtain the effects of timing and intensity and their interactions, but because of the control treatment which is not fully crossed with either timing or intensity, this makes my experimental design unbalanced and statistically tricky. To complicate this a bit more, I would like to include the effect of year into my model as well.

Ideally, I would be able to do this using lme4 which makes multiple comparison a breeze afterwards with the lsmeans package.

When I try to run my model

     m1<-lmer(log(plant.leaf.g+1)~timing*intensity*year+(1|id), data=cmv) #not significant

I am met with the warning "fixed-effect model matrix is rank deficient so dropping 8 columns / coefficients".

Does anyone know of a way I can get this unbalanced mixed model to work with lme4?

Here is a subset of my data where "never" under timing and "zero" under intensity arbitrarily replaced "control" treatment:

id  year    timing  intensity   treatment   plant.leaf.g
91  2015    early   low early-low   315.944
92  2015    never   zero    control 99.28
93  2015    late    high    late-high   663.936
94  2015    early   low early-low   25.488
95  2015    early   high    early-high  453.57
96  2015    late    low late-low    90.804
97  2015    never   zero    control 1312.098
98  2015    late    high    late-high   959.82
99  2015    late    low late-low    28.014
100 2015    late    high    late-high   178.56
91  2014    early   low early-low   289.14
92  2014    never   zero    control 61.774
93  2014    late    high    late-high   639.936
94  2014    early   low early-low   138.39
95  2014    early   high    early-high  168.216
96  2014    late    low late-low    51.008
97  2014    never   zero    control 966.112
98  2014    late    high    late-high   279.048
99  2014    late    low late-low    23.936
100 2014    late    high    late-high   169.344

cmv<-structure(list(id = c(91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 
99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L, 110L, 
91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 
103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L), year = c(2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L), timing = structure(c(1L, 3L, 2L, 1L, 1L, 2L, 3L, 
2L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 
1L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 1L, 3L, 2L
), .Label = c("early", "late", "never"), class = "factor"), intensity =     structure(c(2L, 
3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 3L, 2L, 1L, 
3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 
2L, 3L, 2L, 1L, 3L, 1L), .Label = c("high", "low", "zero"), class = "factor"), 
treatment = structure(c(3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L, 
4L, 5L, 2L, 2L, 5L, 1L, 3L, 2L, 1L, 4L, 3L, 1L, 4L, 3L, 2L, 
5L, 1L, 4L, 5L, 4L, 5L, 2L, 2L, 5L, 5L, 1L, 3L, 2L, 1L, 4L
), .Label = c("control", "early-high", "early-low", "late-high", 
"late-low"), class = "factor"), plant.stem.g = c(315.944, 
99.28, 663.936, 25.488, 453.57, 90.804, 1312.098, 959.82, 
28.014, 178.56, 158.12, 387.528, 288.75, 327.348, 770.44, 
835.05, 457.188, 942.002, 229.194, 289.14, 61.774, 639.936, 
138.39, 168.216, 51.008, 966.112, 279.048, 23.936, 169.344, 
154.14, 703.04, 836.4, 511.92, 463.524, 245.226, 267.41, 
439.392, 714.85, 68.012)), .Names = c("id", "year", "timing", 
"intensity", "treatment", "plant.stem.g"), class = "data.frame", row.names =     c(NA, 
-39L))

Note: I have gotten m1=aov(plant.leaf.g~intensity*timing*year+Error(id), data=cmv) to run, but I read that I should use Anova type="3" function from the car package to obtain my p-values, but I haven't been able to do this with the Error(id) term. Nor have I been able do a multiple comparison with TukeyHSD function or multcomp package.

回答1:

There is nothing intrinsically wrong with

 m1<-lmer(log(plant.leaf.g+1)~timing*intensity*year+(1|id), 
          data=cmv)

(except that log-transforming data with zeros in it is tricky; are you sure that adding 1 is correct? It only makes perfect sense if leaf mass is unitless. You might consider adding min(plant.leaf.g[plant.leaf.g>0])/2 instead ...)

The warning (not an error) occurs because you don't have all combinations of timing, intensity, and year in your data set, but you are asking R to estimate parameters for every combination. A few reasonable choices are:

ignore the warning (you'll probably get reasonable answers anyway when comparing the overall effect of each factor)
reduce the complexity of the model, in particular by eliminating the 3-way interaction (i.e. use (timing+intensity+year)^2) (I'm assuming this will work, but you might need to simplify the model still further if e.g. there are combinations of timing and intensity that are missing from your data)
construct a one-way ANOVA from the 3-way interaction, e.g. cmv$int <- with(cmv,interaction(timing,intensity,year,drop=TRUE)) (but then you won't be able to separate main effects and interactions)

来源：https://stackoverflow.com/questions/39135939/r-analyzing-repeated-measures-unbalanced-design-with-lme4

标签

mixed-models