Better fits for a linear model

后端 未结 1 1314

I am fitting some lines and I feel like I am telling R exactly how to fit them, but I feel like there is something (some factor or effect) I am unaware of that is preventing a g

1条回答
  •  时光说笑
    2021-02-06 16:09

    This answer is going to be a bit encyclopedic, because there are several important points about your question. tl;dr adding year*plot as a random effect is the first step, but the fit is actually a bit problematic, although it doesn't appear so at first: centering the year variable takes care of the problem, but log-transforming the data might be an even better idea.

    UPDATE: OP was doing an analysis on subset(df,Depth==30). I accidentally did the analysis on the full data set, which may have made things harder -- the heteroscedasticity documented below probably isn't as bad for a subset of the data -- but I'm going to leave it as is for pedagogical purposes (and out of laziness).

    Get the data:

    df <- read.csv("rootmeansv2.csv")
    library(nlme)
    gdf <- groupedData( mass ~ year | plot, data=df)
    

    Adding the year-by-plot interaction to the model as a random effect:

    fit0 <- lme(mass ~ year , random = ~ year | plot, data = gdf)
    

    However, if we look at the results we see that this doesn't help very much at all -- the year term (among-plot variance in the slope) is tiny.

    ##             Variance     StdDev       Corr  
    ## (Intercept) 9.522880e-12 3.085916e-06 (Intr)
    ## year        7.110616e-08 2.666574e-04 0.32  
    ## Residual    3.885373e-01 6.233276e-01       
    

    This does sometimes happen (a singular fit), but otherwise the lme summary doesn't indicate in any other way that there might be something wrong. However, if we try to get confidence intervals we see there might be trouble:

     intervals(fit0)
     ##   Error in intervals.lme(fit0) : 
     ##   cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance
    

    The other way to double-check is to fit the same model in lme4.

    library(lme4)
    VarCorr(fit1 <- lmer(mass ~ year +(year | plot), data = gdf))
    
    ##  Groups   Name        Std.Dev.   Corr  
    ##  plot     (Intercept) 0.66159674       
    ##           year        0.00059471 -1.000
    ##  Residual             0.62324403       
    ## Warning messages:
    ## 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
    ##   Model failed to converge with max|grad| = 0.132739 (tol = 0.002, component 1)
    ## 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
    ##   Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
    

    We get apparently quite different answers, and warnings about convergence (this is in the development version, 1.1-7, which does not suffer from the false-positive warnings reported widely for 1.1-6). It looks like lme4's fit is slightly better:

    c(logLik(fit1)-logLik(fit0))
    ## 0.1775161
    

    One of the potential data problems for complex fits such as mixed models is poor centering and scaling of predictor variables. In this case, because year is a continuous predictor that starts at 2008, the intercept and slopes are extremely highly correlated (the intercept is the predicted value in year 0!). We can fix the problem in this case by centering -- it would also be reasonable to subtract the minimum (i.e. start year at 0 rather than 2008)

    c. <- function(x) scale(x,center=TRUE,scale=FALSE)
    VarCorr(fit2 <- update(fit1,.~ c.(year) +(c.(year) | plot)))
    
    ##  Groups   Name        Std.Dev. Corr 
    ##  plot     (Intercept) 0.53798       
    ##           c.(year)    0.10515  1.000
    ##  Residual             0.59634       
    

    We get more reasonable answers (and no warnings), although we do still have perfectly (now positively) correlated intercepts and slopes -- this just says the model is slightly overfitting the data, but there's not much we can do about this (we could force the correlation to zero by fitting the model ~c.(year)+(c.(year)||plot)), but this has its own problems).

    Now try it in lme:

    VarCorr(fit3 <- update(fit0,
                           fixed.=~c.(year),
                           random=~c.(year)|plot,
             control=lmeControl(opt="optim")))
    ## plot = pdLogChol(c.(year)) 
    ##             Variance   StdDev    Corr  
    ## (Intercept) 0.28899909 0.5375864 (Intr)
    ## c.(year)    0.01122497 0.1059479 0.991 
    ## Residual    0.35559015 0.5963138       
    

    Results are similar (although correlation is only 0.991 instead of 1.0): the difference in log-likelihoods is actually slightly larger, but still small. (The fit is still somewhat problematic -- I had to change optimizers as shown in the lmeControl argument to get here.)

    Things now look better:

    plot(augPred(fit3))
    

    enter image description here

    However, the residual vs fitted plot shows you a problem you should worry about:

    plot(fit3)
    

    enter image description here

    The funnel shape shows you have a heteroscedasticity problem. The scale-location plot shows it even more clearly:

    plot(fit3,sqrt(abs(resid(.)))~fitted(.),type=c("p","smooth"))
    

    enter image description here

    The most obvious fix is to log-transform the data:

    df$logmass <- log10(df$mass)  ## log10() for interpretability
    gdfL <- groupedData( logmass ~ year | plot, data=df)
    VarCorr(fitL1 <- lme(logmass ~ c.(year) , random = ~ c.(year) | plot,
                 data = gdfL))
    ## plot = pdLogChol(c.(year)) 
    ##             Variance     StdDev     Corr  
    ## (Intercept) 0.1842905872 0.42929080 (Intr)
    ## c.(year)    0.0009702893 0.03114947 -0.607
    ## Residual    0.1052946948 0.32449144       
    

    The among-year variation is small again, but this time it's probably because it's not needed. There are no warnings from lmer when we fit the equivalent model, and we get identical results; fit is non-singular (no zero variances or perfect correlations); and intervals(fitL1) works.

    The predictions look reasonable:

    plot(augPred(fitL1))
    

    enter image description here

    So does the diagnostic plot:

    plot(fitL1)
    

    enter image description here

    Although there does appear to be something a bit funny about 2008 (the axes are flipped because lme prefers to plot boxplots horizontally -- factor(year) tells lme to use a boxplot instead of a scatterplot).

    plot(fitL1,factor(year)~resid(.))
    

    enter image description here

    0 讨论(0)
提交回复
热议问题