ggplot add Normal Distribution while using `facet_wrap` [duplicate]

谁说胖子不能爱 提交于 2021-02-19 07:52:51

问题


I'm looking to plot the following histograms:

library(palmerpenguins)
library(tidyverse)

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram() + 
  facet_wrap(~species)

For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.

Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot command, but I wonder whether there is a smarter/faster way to do this.

I have tried:

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram() + 
  facet_wrap(~species) + 
  stat_function(fun = dnorm)

But this only gives me a thin line at the bottom:

Any ideas? Thanks!

Edit I guess what I'm trying to recreate is this simple command from Stata:

hist bill_length_mm, by(species) normal

which gives me this:

I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R

But I'm specifically looking for a short answer that does not require me creating a separate function.


回答1:


A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).

library(palmerpenguins)
library(tidyverse)
library(ggh4x)

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(binwidth = 1) + 
  stat_theodensity(aes(y = after_stat(count))) +
  facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).

You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(binwidth = 2) + 
  stat_theodensity(aes(y = after_stat(count)*2)) +
  facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).

Created on 2021-01-27 by the reprex package (v0.3.0)

If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(aes(y = after_stat(density))) + 
  stat_theodensity() +
  facet_wrap(~species)



回答2:


While the ggh4x package is the way to go in this case, a more generalizable approach is with tapply and the use of the PANEL variable which is added to the data when a facet is applied.

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30) + 
  facet_wrap(~species) + 
  geom_line(aes(y = dnorm(bill_length_mm,
                          mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
                          sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))



来源:https://stackoverflow.com/questions/65924407/ggplot-add-normal-distribution-while-using-facet-wrap

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!