问题
I'm looking to plot the following histograms:
library(palmerpenguins)
library(tidyverse)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species)
For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.
Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot
command, but I wonder whether there is a smarter/faster way to do this.
I have tried:
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species) +
stat_function(fun = dnorm)
But this only gives me a thin line at the bottom:
Any ideas? Thanks!
Edit I guess what I'm trying to recreate is this simple command from Stata:
hist bill_length_mm, by(species) normal
which gives me this:
I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R
But I'm specifically looking for a short answer that does not require me creating a separate function.
回答1:
A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).
library(palmerpenguins)
library(tidyverse)
library(ggh4x)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 1) +
stat_theodensity(aes(y = after_stat(count))) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 2) +
stat_theodensity(aes(y = after_stat(count)*2)) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
Created on 2021-01-27 by the reprex package (v0.3.0)
If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density))) +
stat_theodensity() +
facet_wrap(~species)
回答2:
While the ggh4x
package is the way to go in this case, a more generalizable approach is with tapply
and the use of the PANEL
variable which is added to the data when a facet is applied.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density)), bins = 30) +
facet_wrap(~species) +
geom_line(aes(y = dnorm(bill_length_mm,
mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))
来源:https://stackoverflow.com/questions/65924407/ggplot-add-normal-distribution-while-using-facet-wrap