问题
I've had great experiences asking for help here before and I'm hoping to get some help again.
I'm estimating a rather large mixed effects model in which one of the random effects has over 150 different levels. That would make a standard caterpillar plot to be quite unreadable.
I would like, if all possible, to get a caterpillar plot of just the levels of the random effect that are, for a lack of better term, "significant". That is: I want a caterpillar plot in which either the random intercept or the random slope for a varying coefficient has a "confidence interval" (I know that's not quite what it is) that does not include zero.
Consider this standard model from the sleepstudy
data that is standard with lme4
.
library(lme4)
fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
ggCaterpillar(ranef(fit,condVar=TRUE), QQ=FALSE, likeDotplot=TRUE, reorder=FALSE)[["Subject"]]
I would get this caterpillar plot.
The caterpillar plot I use comes from this code. Do note I tend to use less conservative bounds for the intervals (i.e. 1.645*se and not 1.96*se).
Basically, I want a caterpillar plot that would just include the levels for 308, 309, 310, 330, 331, 335, 337, 349, 350, 352, and 370 because those levels had either intercepts or slopes whose intervals did not include zero. I ask because my caterpillar plot of over 150 different levels is kind of unreadable and I think this might be a worthwhile solution to it.
Reproducible code follows. I genuinely appreciate any help.
# https://stackoverflow.com/questions/34120578/how-can-i-sort-random-effects-by-value-of-the-random-effect-not-the-intercept
ggCaterpillar <- function(re, QQ=TRUE, likeDotplot=TRUE, reorder=TRUE) {
require(ggplot2)
f <- function(x) {
pv <- attr(x, "postVar")
cols <- 1:(dim(pv)[1])
se <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ])))
if (reorder) {
ord <- unlist(lapply(x, order)) + rep((0:(ncol(x) - 1)) * nrow(x), each=nrow(x))
pDf <- data.frame(y=unlist(x)[ord],
ci=1.645*se[ord],
nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)),
ID=factor(rep(rownames(x), ncol(x))[ord], levels=rownames(x)[ord]),
ind=gl(ncol(x), nrow(x), labels=names(x)))
} else {
pDf <- data.frame(y=unlist(x),
ci=1.645*se,
nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)),
ID=factor(rep(rownames(x), ncol(x)), levels=rownames(x)),
ind=gl(ncol(x), nrow(x), labels=names(x)))
}
if(QQ) { ## normal QQ-plot
p <- ggplot(pDf, aes(nQQ, y))
p <- p + facet_wrap(~ ind, scales="free")
p <- p + xlab("Standard normal quantiles") + ylab("Random effect quantiles")
} else { ## caterpillar dotplot
p <- ggplot(pDf, aes(ID, y)) + coord_flip()
if(likeDotplot) { ## imitate dotplot() -> same scales for random effects
p <- p + facet_wrap(~ ind)
} else { ## different scales for random effects
p <- p + facet_grid(ind ~ ., scales="free_y")
}
p <- p + xlab("Levels of the Random Effect") + ylab("Random Effect")
}
p <- p + theme(legend.position="none")
p <- p + geom_hline(yintercept=0)
p <- p + geom_errorbar(aes(ymin=y-ci, ymax=y+ci), width=0, colour="black")
p <- p + geom_point(aes(size=1.2), colour="blue")
return(p)
}
lapply(re, f)
}
library(lme4)
fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
ggCaterpillar(ranef(fit,condVar=TRUE), QQ=FALSE, likeDotplot=TRUE, reorder=FALSE)[["Subject"]]
ggsave(file="sleepstudy.png")
回答1:
First, thanks for putting "significant" in quotation marks ... everyone reading this should remember that significance doesn't have any statistical meaning in this context (it might be better to use a Z-statistic (value/std.error) criterion such as |Z|>1.5 or |Z|>1.75 instead, just to emphasize that this is not an inferential threshold ...)
I ended up getting a little carried away ... I decided that it would be better to refactor/modularize things a little bit, so I wrote an augment
method (designed to work with the broom
package) that constructs useful data frames from ranef.mer
objects ... once this is done, the manipulations you want are pretty easy.
I put the augment.ranef.mer
code at the end of my answer -- it's a bit long (you'll need to source it before you can run the code here). update: this augment
method has been part of the broom.mixed
package for a while now ...
library(broom)
library(reshape2)
library(plyr)
Apply the augment
method to the RE object:
rr <- ranef(fit,condVar=TRUE)
aa <- augment(rr)
names(aa)
## [1] "grp" "variable" "level" "estimate" "qq" "std.error"
## [7] "p" "lb" "ub"
Now the ggplot
code is pretty basic. I'm using geom_errorbarh(height=0)
rather than geom_pointrange()+coord_flip()
because ggplot2
can't use coord_flip
with facet_wrap(...,scales="free")
...
## Q-Q plot:
g0 <- ggplot(aa,aes(estimate,qq,xmin=lb,xmax=ub))+
geom_errorbarh(height=0)+
geom_point()+facet_wrap(~variable,scale="free_x")
## regular caterpillar plot:
g1 <- ggplot(aa,aes(estimate,level,xmin=lb,xmax=ub))+
geom_errorbarh(height=0)+
geom_vline(xintercept=0,lty=2)+
geom_point()+facet_wrap(~variable,scale="free_x")
Now find the levels you want to keep:
aa2 <- ddply(aa,c("grp","level"),
transform,
keep=any(p<0.05))
aa3 <- subset(aa2,keep)
Update caterpillar plot with only levels with "significant" slopes or intercepts:
g1 %+% aa3
If you only wanted to highlight "significant" levels rather than removing "non-significant" levels entirely
ggplot(aa2,aes(estimate,level,xmin=lb,xmax=ub,colour=factor(keep)))+
geom_errorbarh(height=0)+
geom_vline(xintercept=0,lty=2)+
geom_point()+facet_wrap(~variable,scale="free_x")+
scale_colour_manual(values=c("black","red"),guide=FALSE)
##' @importFrom reshape2 melt
##' @importFrom plyr ldply name_rows
augment.ranef.mer <- function(x,
ci.level=0.9,
reorder=TRUE,
order.var=1) {
tmpf <- function(z) {
if (is.character(order.var) && !order.var %in% names(z)) {
order.var <- 1
warning("order.var not found, resetting to 1")
}
## would use plyr::name_rows, but want levels first
zz <- data.frame(level=rownames(z),z,check.names=FALSE)
if (reorder) {
## if numeric order var, add 1 to account for level column
ov <- if (is.numeric(order.var)) order.var+1 else order.var
zz$level <- reorder(zz$level, zz[,order.var+1], FUN=identity)
}
## Q-Q values, for each column separately
qq <- c(apply(z,2,function(y) {
qnorm(ppoints(nrow(z)))[order(order(y))]
}))
rownames(zz) <- NULL
pv <- attr(z, "postVar")
cols <- 1:(dim(pv)[1])
se <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ])))
## n.b.: depends on explicit column-major ordering of se/melt
zzz <- cbind(melt(zz,id.vars="level",value.name="estimate"),
qq=qq,std.error=se)
## reorder columns:
subset(zzz,select=c(variable, level, estimate, qq, std.error))
}
dd <- ldply(x,tmpf,.id="grp")
ci.val <- -qnorm((1-ci.level)/2)
transform(dd,
p=2*pnorm(-abs(estimate/std.error)), ## 2-tailed p-val
lb=estimate-ci.val*std.error,
ub=estimate+ci.val*std.error)
}
来源:https://stackoverflow.com/questions/34344599/a-caterpillar-plot-of-just-the-significant-random-effects-from-a-mixed-effects