问题
I am trying to combine percentage histogram with facet_wrap
, but the percentages are not calculated based on group but all data. I would like each histogram to show distribution in a group, not relative to all population. I know it is possible to do several plots and combine them with multiplot
.
library(ggplot2)
library(scales)
library(dplyr)
set.seed(1)
df <- data.frame(age = runif(900, min = 10, max = 100),
group = rep(c("a", "b", "c", "d", "e", "f", "g", "h", "i"), 100))
tmp <- df %>%
mutate(group = "ALL")
df <- rbind(df, tmp)
ggplot(df, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) +
facet_wrap(~ group, ncol = 5)
Output:
回答1:
Try with y = stat(density)
(or y = ..density..
prior to ggplot2 version 3.0.0) instead of y = (..count..)/sum(..count..)
ggplot(df, aes(age, group = group)) +
geom_histogram(aes(y = stat(density) * 5), binwidth = 5) +
scale_y_continuous(labels = percent ) +
facet_wrap(~ group, ncol = 5)
from ?geom_histogram
under "Computed variables"
density : density of points in bin, scaled to integrate to 1
We multiply by 5 (the bin width) because the y-axis is a density (the area integrates to 1), not a percentage (the heights sum to 1), see Hadley's comment (thanks to @MariuszSiatka).
回答2:
While it seems facet_wrap
does not run the special geom_histogram
percentage calculation within each subset, consider building a list of plots separately and then grid arrange them together.
Specifically, call by
to run your ggplots in subsets of group and then call gridExtra::grid.arrange()
(actual package method) to somewhat mimic facet_wrap
:
library(ggplot2)
library(scales)
library(gridExtra)
...
grp_plots <- by(df, df$group, function(sub){
ggplot(sub, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]]) +
theme(plot.title = element_text(hjust = 0.5))
})
grid.arrange(grobs = grp_plots, ncol=5)
However to avoid the repeated y-axis and x-axis, consider conditionally setting the theme
within by
call, assuming you know your groups ahead of time and they are a reasonable handful in number.
grp_plots <- by(df, df$group, function(sub){
# BASE GRAPH
p <- ggplot(sub, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]])
# CONDITIONAL theme() CALLS
if (sub$group[[1]] %in% c("a")) {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.x = element_blank(),
axis.text.x = element_blank(), axis.ticks.x = element_blank())
}
else if (sub$group[[1]] %in% c("f")) {
p <- p + theme(plot.title = element_text(hjust = 0.5))
}
else if (sub$group[[1]] %in% c("b", "c", "d", "e")) {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.ticks.x = element_blank())
}
else {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank())
}
return(p)
})
grid.arrange(grobs=grp_plots, ncol=5)
来源:https://stackoverflow.com/questions/52690318/percentage-histogram-with-facet-wrap