I have a table of data with a column representing a lab value for each study subject (rows).
I want to generate a series of histograms showing the distribution of v
If you combine the tidyr
and ggplot2
packages, you can use facet_wrap
to make a quick set of histograms of each variable in your data.frame.
You need to reshape your data to long form with tidyr::gather
, so you have key
and value
columns like such:
library(tidyr)
library(ggplot2)
# or `library(tidyverse)`
mtcars %>% gather() %>% head()
#> key value
#> 1 mpg 21.0
#> 2 mpg 21.0
#> 3 mpg 22.8
#> 4 mpg 21.4
#> 5 mpg 18.7
#> 6 mpg 18.1
Using this as our data, we can map value
as our x variable, and use facet_wrap
to separate by the key
column:
ggplot(gather(mtcars), aes(value)) +
geom_histogram(bins = 10) +
facet_wrap(~key, scales = 'free_x')
The scales = 'free_x'
is necessary unless your data is all of a similar scale.
You can replace bins = 10
with anything that evaluates to a number, which may allow you to set them somewhat individually with some creativity. Alternatively, you can set binwidth
, which may be more practical, depending on what your data looks like. Regardless, binning will take some finesse.
I just came across the multi.hist() function from the psych package . It allows you to quickly plot histograms by specific columns and looks like you can set different breaks for each column.
You could generate the plots in a for loop with something like this, if your data frame is named "df" and you want to generate histograms starting with column 2 (if column 1 is your id):
for (col in 2:ncol(df)) {
hist(df[,col])
}
The hist function automatically calculates a reasonable bin width, or you can specify a fixed number of bins for all histograms, by adding the breaks argument:
hist(df[,col], breaks=10)
If you use RStudio, all your plots will be automatically be saved in the plots pane. If not, you will need to save each plot to a separate file inside the loop, as explained here: http://www.r-bloggers.com/automatically-save-your-plots-to-a-folder/