问题
I am trying to run variable selection on Poisson mixed-effect models using glmer()
and dredge()
. Since several variables are collinear I use the subsetting function of dredge
to avoid correlated variables. However, to use dredge()
effectively one needs to have a full model including all terms - which can lead to full model to be rank-deficient.
[edited Feb 15 2016] To give a reproducible example, let's generate a random data set:
dfdat<-data.frame(replicate(6, round(rnorm(6),2)))
dfdat$group<-factor(sample(1:2,nrow(dfdat),replace=T))
dfdat$Y<-rpois(nrow(dfdat),10)+rpois(nrow(dfdat),as.numeric(dfdat$group))
dfdat
X1 X2 X3 X4 X5 X6 group Y
1 -0.88 0.05 1.33 -1.51 0.61 -0.09 2 8
2 -0.12 -0.57 0.05 -1.12 0.60 -0.41 1 7
3 0.14 -0.97 -1.04 0.40 0.87 0.27 1 9
4 -1.04 -0.26 -1.33 0.77 -1.84 1.67 1 11
5 -1.06 1.10 -0.09 0.50 -2.62 2.15 1 10
6 -1.74 -0.61 0.72 -0.29 -0.30 -0.93 1 8
Trying to run a model with all 6 terms does not work as the model is rank-deficient:
#library(MuMIn) # not run
#library(lme4) # not run
vars<-names(dfdat)[1:6]
form<-formula(paste0('Y~',paste0(vars,collapse='+'),'+(1|group)'))
fmod<-glmer(form,data=dfdat,family='poisson')
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
Error: pwrssUpdate did not converge in (maxit) iterations
Using dredge
on fmod
would lead to the one variable dropped by glmer
being always excluded.
The solution, suggested here seems to 1. run a model that converges, and 2.trick dredge into considering the full list of variables by changing the formula in the converged model.
## full model is rank deficient, so use smaller subset
vars.red<-vars[1:3]
form.red<-formula(paste0('Y~',paste0(vars.red,collapse='+'),'+(1|group)'))
fmod.red<-glmer(form.red,data=dfdat,family='poisson')
This new model fmod.red
converges, but only includes variables X1,X2 and X3.
Now to the "tricking dredge" part. The solution proposed on the page linked above didn't work as such with glmer
as the structure of mermod
is different from gamms. So I tried to use:
fom.red@call$formula<-form
where form
has all my covariates (to be subsetted).
This didn't work, but using the formula in the frame element, as suggested by Kamil Bartoń below, did work:
# replace formula in the frame element of fmod.red
attr(fmod.red@frame,"formula")<-form
# check
formula(fmod.red)
# now apply dredge function with covariates
# exclude variable combinations (randomly chosen for the sake of example)
sexpr<-expression(!((X1 && X3) || (X1&&X6) || (X4 && X6) || (X4 && X5)))
# run dredge()
options(na.action = na.fail)
ms<-dredge(fmod.red,subset=sexpr)
UPDATE
While ms
seemed to include all variables, as shown by:
names(ms)
[1] "(Intercept)" "X1" "X2" "X3" "X4" "X5" "X6"
[8] "df" "logLik" "AICc" "delta" "weight"
the new variables (X4,X5,X6) were never actually included (NAs everywhere):
summary(ms)
(Intercept) X1 X2 X3 X4 X5 X6
Min. :2.407 Min. :0.09698 Min. :-0.4026 Min. :-0.42078 + : 0 + : 0 + : 0
1st Qu.:2.443 1st Qu.:0.22688 1st Qu.:-0.3204 1st Qu.:-0.35303 NA's:26 NA's:26 NA's:26
Median :2.474 Median :0.27361 Median :-0.2980 Median :-0.22444
Mean :2.535 Mean :0.27539 Mean :-0.3059 Mean :-0.23517
3rd Qu.:2.515 3rd Qu.:0.32357 3rd Qu.:-0.2718 3rd Qu.:-0.17472
Max. :3.009 Max. :0.45664 Max. :-0.2177 Max. : 0.08802
NA's :20 NA's :13 NA's :16
What is happening?
回答1:
In "merMod"
objects, formula is first looked for at attr(<object>@frame, "formula")
(see the function code of getS3method("formula", "merMod")
). So, replacing it in a call element was not effective, which can be tested with formula()
or getAllTerms()
. Replace the "formula"
attribute of @frame
.
Edit: it turns out it isn't that easy to trick dredge
, because it also looks at coef
(or fixef
in this case) when building the table. To work that around, first generate calls, eval
uate, then build the table with model.sel
:
model.sel(lapply(dredge(..., evaluate = FALSE), eval), ...)
来源:https://stackoverflow.com/questions/35363447/mumin-dredge-when-global-mixed-effects-model-is-rank-deficient