问题
I'm currently plotting a number of different distributions of first differences from a number of regression models in ggplot. To facilitate interpretation of the differences, I want to mark the 2.5% and the 97.5% percentile of each distribution. Since I will be doing quite a few plots, and because the data is grouped in two dimension (model and type), I would like to define and plot the respective percentiles in the ggplot environment. Plotting the distributions using facets gets me to exactly where I want except for the percentiles. I could of course do this more manually, but I would ideally want to find a solution where I am still able to use facet_grid
, since this spared me a lot of hassle trying to fit the different plots together.
Here's an example using simulated data:
df.example <- data.frame(model = rep(c("a", "b"), length.out = 500),
type = rep(c("t1", "t2", "t2", "t1"),
length.outh = 250), value = rnorm(1000))
ggplot(df.example, aes(x = value)) +
facet_grid(type ~ model) +
geom_density(aes(fill = model, colour = model))
I've tried to add quantiles in two ways. The first one produces an error message:
ggplot(df.example, aes(x = value)) +
facet_grid(. ~ model) +
geom_density(aes(fill = model, colour = model)) +
geom_vline(aes(x = value), xintercept = quantile(value, probs = c(.025, .975)))
Error in quantile(value, probs = c(0.025, 0.975)) : object 'value' not found
While the second one gets me the quantiles for the the complete variable and not for the sub-densities. That is, the plotted quantiles are identical for all four densities.
ggplot(df.example, aes(x = value)) +
facet_grid(type ~ model) +
geom_density(aes(fill = model, colour = model)) +
geom_vline(xintercept = quantile(df.example$value, probs = c(.025, .975)))
I consequently wonder if there is a way to plot the specific quantiles for each subgroup within the ggplot2 environment?
Greatly appreciate any input.
回答1:
You can calculate the quantiles beforehand.
Using your example data:
library (dplyr)
d2 <- df.example %>%
group_by(model, type) %>%
summarize(lower = quantile(value, probs = .025),
upper = quantile(value, probs = .975))
And then plot like this:
ggplot(df.example, aes(x = value)) +
facet_grid(type ~ model) +
geom_density(aes(fill = model, colour = model)) +
geom_vline(data = d2, aes(xintercept = lower)) +
geom_vline(data = d2, aes(xintercept = upper))
回答2:
Use plyr (or dplyr, data.table) to precompute these values ...
set.seed(1)
# ...
df.q <- ddply(df.example, .(model, type),
summarize, q=quantile(value, c(.025, .975)))
p + geom_vline(aes(xintercept=q), data=df.q)
回答3:
Good question. The more general version of the same question is: how do you call functions on the subsetted datasets when using facets? This seems like a very useful feature and so I searched around but could not find anything about it.
The answers already given are excellent. Another option is to use multiplot() as a way of doing the faceting manually.
来源:https://stackoverflow.com/questions/30568873/plot-quantiles-of-distribution-in-ggplot2-with-facets