How can I make a Frequency distribution bar plot in ggplot2?

前端 未结 2 841
南旧
南旧 2021-01-26 00:25

Sample of the dataset.

nq
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.1         


        
2条回答
  •  南笙
    南笙 (楼主)
    2021-01-26 00:50

    I thought this would be easy, but it turned out to be frustrating. So perhaps the "right" way is to transform your data before using ggplot as it looks like @DavidArenburg has done. But, if you feel like hacking ggplot, here's what I ended up doing.

    First, some sample data.

    set.seed(15)
    dd<-data.frame(x=sample(1:25, 100, replace=T, prob=25:1))
    br <- seq(0,25, by=5) # break points
    

    My first attempt was

    library(ggplot2)
    ggplot(dd, aes(x)) + 
        stat_bin(position="stack", breaks=br) + 
        geom_text(aes(y=..count.., label=..density..*..width.., ymax=..count..+1), 
            vjust=-.5, breaks=br, stat="bin")
    

    but that didn't make "pretty labels"

    enter image description here

    so i thought i'd use the percent() function from the scales package to make it pretty. However, silly ggplot doesn't really make it possible to use functions with ..().. variables because it evaluates them in the data.frame only (then the empty baseenv()). It doesn't have a way to find the function you use. So this is when I turned to hacking. First i'll extract the "Layer" definition from ggplot and the map_statistic from it. (NOTE: this was done with "ggplot2_1.0.0" and is specific to that version; this is a private function that may change in future releases)

    orig.map_statistic <- ggplot2:::Layer$map_statistic
    new.map_statistic <- orig.map_statistic
    body(new.map_statistic)[[9]]
    # stat_data <- as.data.frame(lapply(new, eval, data, baseenv()))
    

    here's the line that's causing grief I would prefer it the function resolved other names in the plot environment that are not found in the data.frame. So I decided to change it with

    body(new.map_statistic)[[9]] <- quote(stat_data <- as.data.frame(lapply(new, eval, data, plot$plot_env)))
    assign("map_statistic", new.map_statistic, envir=ggplot2:::Layer)
    

    So now I can use functions with ..().. variables. So I can do

    library(scales)
    ggplot(dd, aes(x)) + 
        stat_bin(position="stack", breaks=br) + 
        geom_text(aes(y=..count.., ymax=..count..+2, 
            label=percent(..density..*..width..)), 
            vjust=-.5, breaks=br, stat="bin")
    

    to get

    enter image description here

    So i'm not sure why ggplot has this default behavior. There could be some good reason for it but I don't know what it is. This does change how ggplot will behave for the rest of the session. You can change back to default with

    assign("map_statistic", orig.map_statistic, envir=ggplot2:::Layer)
    

提交回复
热议问题