I am fitting some lines and I feel like I am telling R exactly how to fit them, but I feel like there is something (some factor or effect) I am unaware of that is preventing a g
This answer is going to be a bit encyclopedic, because there are several important points about your question. tl;dr adding year*plot
as a random effect is the first step, but the fit is actually a bit problematic, although it doesn't appear so at first: centering the year
variable takes care of the problem, but log-transforming the data might be an even better idea.
UPDATE: OP was doing an analysis on subset(df,Depth==30)
. I accidentally did the analysis on the full data set, which may have made things harder -- the heteroscedasticity documented below probably isn't as bad for a subset of the data -- but I'm going to leave it as is for pedagogical purposes (and out of laziness).
Get the data:
df <- read.csv("rootmeansv2.csv")
library(nlme)
gdf <- groupedData( mass ~ year | plot, data=df)
Adding the year-by-plot interaction to the model as a random effect:
fit0 <- lme(mass ~ year , random = ~ year | plot, data = gdf)
However, if we look at the results we see that this doesn't help very much at all -- the year
term (among-plot variance in the slope) is tiny.
## Variance StdDev Corr
## (Intercept) 9.522880e-12 3.085916e-06 (Intr)
## year 7.110616e-08 2.666574e-04 0.32
## Residual 3.885373e-01 6.233276e-01
This does sometimes happen (a singular fit), but otherwise the lme
summary doesn't indicate in any other way that there might be something wrong. However, if we try to get confidence intervals we see there might be trouble:
intervals(fit0)
## Error in intervals.lme(fit0) :
## cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance
The other way to double-check is to fit the same model in lme4
.
library(lme4)
VarCorr(fit1 <- lmer(mass ~ year +(year | plot), data = gdf))
## Groups Name Std.Dev. Corr
## plot (Intercept) 0.66159674
## year 0.00059471 -1.000
## Residual 0.62324403
## Warning messages:
## 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.132739 (tol = 0.002, component 1)
## 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
We get apparently quite different answers, and warnings about convergence (this is in the development version, 1.1-7, which does not suffer from the false-positive warnings reported widely for 1.1-6). It looks like lme4
's fit is slightly better:
c(logLik(fit1)-logLik(fit0))
## 0.1775161
One of the potential data problems for complex fits such as mixed models is poor centering and scaling of predictor variables. In this case, because year
is a continuous predictor that starts at 2008, the intercept and slopes are extremely highly correlated (the intercept is the predicted value in year 0!). We can fix the problem in this case by centering -- it would also be reasonable to subtract the minimum (i.e. start year at 0 rather than 2008)
c. <- function(x) scale(x,center=TRUE,scale=FALSE)
VarCorr(fit2 <- update(fit1,.~ c.(year) +(c.(year) | plot)))
## Groups Name Std.Dev. Corr
## plot (Intercept) 0.53798
## c.(year) 0.10515 1.000
## Residual 0.59634
We get more reasonable answers (and no warnings), although we do still have perfectly (now positively) correlated intercepts and slopes -- this just says the model is slightly overfitting the data, but there's not much we can do about this (we could force the correlation to zero by fitting the model ~c.(year)+(c.(year)||plot))
, but this has its own problems).
Now try it in lme
:
VarCorr(fit3 <- update(fit0,
fixed.=~c.(year),
random=~c.(year)|plot,
control=lmeControl(opt="optim")))
## plot = pdLogChol(c.(year))
## Variance StdDev Corr
## (Intercept) 0.28899909 0.5375864 (Intr)
## c.(year) 0.01122497 0.1059479 0.991
## Residual 0.35559015 0.5963138
Results are similar (although correlation is only 0.991 instead of 1.0): the difference in log-likelihoods is actually slightly larger, but still small. (The fit is still somewhat problematic -- I had to change optimizers as shown in the lmeControl
argument to get here.)
Things now look better:
plot(augPred(fit3))
However, the residual vs fitted plot shows you a problem you should worry about:
plot(fit3)
The funnel shape shows you have a heteroscedasticity problem. The scale-location plot shows it even more clearly:
plot(fit3,sqrt(abs(resid(.)))~fitted(.),type=c("p","smooth"))
The most obvious fix is to log-transform the data:
df$logmass <- log10(df$mass) ## log10() for interpretability
gdfL <- groupedData( logmass ~ year | plot, data=df)
VarCorr(fitL1 <- lme(logmass ~ c.(year) , random = ~ c.(year) | plot,
data = gdfL))
## plot = pdLogChol(c.(year))
## Variance StdDev Corr
## (Intercept) 0.1842905872 0.42929080 (Intr)
## c.(year) 0.0009702893 0.03114947 -0.607
## Residual 0.1052946948 0.32449144
The among-year variation is small again, but this time it's probably because it's not needed. There are no warnings from lmer
when we fit the equivalent model, and we get identical results; fit is non-singular (no zero variances or perfect correlations); and intervals(fitL1)
works.
The predictions look reasonable:
plot(augPred(fitL1))
So does the diagnostic plot:
plot(fitL1)
Although there does appear to be something a bit funny about 2008 (the axes are flipped because lme
prefers to plot boxplots horizontally -- factor(year)
tells lme
to use a boxplot instead of a scatterplot).
plot(fitL1,factor(year)~resid(.))