How do I generate a histogram for each column of my table?

后端 未结 3 1155
执念已碎
执念已碎 2020-12-14 10:05

I have a table of data with a column representing a lab value for each study subject (rows).

I want to generate a series of histograms showing the distribution of v

相关标签:
3条回答
  • 2020-12-14 10:37

    If you combine the tidyr and ggplot2 packages, you can use facet_wrap to make a quick set of histograms of each variable in your data.frame.

    You need to reshape your data to long form with tidyr::gather, so you have key and value columns like such:

    library(tidyr)
    library(ggplot2)
    # or `library(tidyverse)`
    
    mtcars %>% gather() %>% head()
    #>   key value
    #> 1 mpg  21.0
    #> 2 mpg  21.0
    #> 3 mpg  22.8
    #> 4 mpg  21.4
    #> 5 mpg  18.7
    #> 6 mpg  18.1
    

    Using this as our data, we can map value as our x variable, and use facet_wrap to separate by the key column:

    ggplot(gather(mtcars), aes(value)) + 
        geom_histogram(bins = 10) + 
        facet_wrap(~key, scales = 'free_x')
    

    The scales = 'free_x' is necessary unless your data is all of a similar scale.

    You can replace bins = 10 with anything that evaluates to a number, which may allow you to set them somewhat individually with some creativity. Alternatively, you can set binwidth, which may be more practical, depending on what your data looks like. Regardless, binning will take some finesse.

    0 讨论(0)
  • 2020-12-14 10:51

    I just came across the multi.hist() function from the psych package . It allows you to quickly plot histograms by specific columns and looks like you can set different breaks for each column.

    0 讨论(0)
  • 2020-12-14 10:53

    You could generate the plots in a for loop with something like this, if your data frame is named "df" and you want to generate histograms starting with column 2 (if column 1 is your id):

    for (col in 2:ncol(df)) {
        hist(df[,col])
    }
    

    The hist function automatically calculates a reasonable bin width, or you can specify a fixed number of bins for all histograms, by adding the breaks argument:

    hist(df[,col], breaks=10)
    

    If you use RStudio, all your plots will be automatically be saved in the plots pane. If not, you will need to save each plot to a separate file inside the loop, as explained here: http://www.r-bloggers.com/automatically-save-your-plots-to-a-folder/

    0 讨论(0)
提交回复
热议问题