How can I create a histogram for all variables in a data set with minimal effort in R?

后端 未结 1 537
面向向阳花
面向向阳花 2021-01-05 02:45

Exploring a new data set: What is the easiest, quickest way to visualise many (all) variables?

Ideally, the output shows the histograms next to each other with minim

相关标签:
1条回答
  • 2021-01-05 03:12

    There may be three broad approaches:

    1. Commands from packages such as hist.data.frame()
    2. Looping over variables or similar macro constructs
    3. Stacking variables and using facets

    Packages

    Other commands available that may be helpful:

    library(plyr)
    library(psych)
    multi.hist(mpg) #error, not numeric
    multi.hist(mpg[,sapply(mpg, is.numeric)])
    

    or perhaps multhist from plotrix, which I haven't explored. Both of them do not offer the flexibilty I was looking for.

    Loops

    As an R beginner everyone advised me to stay away from loops. So I did, but perhaps it is worth a try here. Any suggestions are very welcome. Perhaps you could comment on how to combine the graphs into one file.

    Stacking

    My first suspicion was that stacking variables might get out of hand. However, it might be the best strategy for a reasonable set of variables.

    One example I came up with uses the melt function.

    library(reshape2)
    mpgid <- mutate(mpg, id=as.numeric(rownames(mpg)))
    mpgstack <- melt(mpgid, id="id")
    pp <- qplot(value, data=mpgstack) + facet_wrap(~variable, scales="free")
    # pp + stat_bin(geom="text", aes(label=..count.., vjust=-1))
    ggsave("mpg-histograms.pdf", pp, scale=2)
    

    (As you can see I tried to put value labels on the bars for more information density, but that didn't go so well. The labels on the x-axis are also less than ideal.)

    No solution here is perfect and there won't be a one-size-fits-all command. But perhaps we can get closer to ease exploring a new data set.

    0 讨论(0)
提交回复
热议问题