Show percent % instead of counts in charts of categorical variables

前端 未结 8 2397
梦如初夏
梦如初夏 2020-11-22 06:06

I\'m plotting a categorical variable and instead of showing the counts for each category value.

I\'m looking for a way to get ggplot to display the perc

相关标签:
8条回答
  • 2020-11-22 06:15

    Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

     require(ggplot2)
     require(scales)
    
     p <- ggplot(mydataf, aes(x = foo)) +  
            geom_bar(aes(y = (..count..)/sum(..count..))) + 
            ## version 3.0.0
            scale_y_continuous(labels=percent)
    

    Here's a reproducible example using mtcars:

     ggplot(mtcars, aes(x = factor(hp))) +  
            geom_bar(aes(y = (..count..)/sum(..count..))) + 
            scale_y_continuous(labels = percent) ## version 3.0.0
    

    This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

    Remark: If hp is not set as a factor, ggplot returns:

    0 讨论(0)
  • 2020-11-22 06:19

    this modified code should work

    p = ggplot(mydataf, aes(x = foo)) + 
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        scale_y_continuous(formatter = 'percent')
    

    if your data has NAs and you dont want them to be included in the plot, pass na.omit(mydataf) as the argument to ggplot.

    hope this helps.

    0 讨论(0)
  • 2020-11-22 06:23

    Note that if your variable is continuous, you will have to use geom_histogram(), as the function will group the variable by "bins".

    df <- data.frame(V1 = rnorm(100))
    
    ggplot(df, aes(x = V1)) +  
      geom_histogram(aes(y = (..count..)/sum(..count..))) 
    
    # if you use geom_bar(), with factor(V1), each value of V1 will be treated as a
    # different category. In this case this does not make sense, as the variable is 
    # really continuous. With the hp variable of the mtcars (see previous answer), it 
    # worked well since hp was not really continuous (check unique(mtcars$hp)), and one 
    # can want to see each value of this variable, and not to group it in bins.
    ggplot(df, aes(x = factor(V1))) +  
      geom_bar(aes(y = (..count..)/sum(..count..))) 
    
    0 讨论(0)
  • 2020-11-22 06:25

    With ggplot2 version 2.1.0 it is

    + scale_y_continuous(labels = scales::percent)
    
    0 讨论(0)
  • 2020-11-22 06:26

    If you want percentage labels but actual Ns on the y axis, try this:

        library(scales)
    perbar=function(xx){
          q=ggplot(data=data.frame(xx),aes(x=xx))+
          geom_bar(aes(y = (..count..)),fill="orange")
           q=q+    geom_text(aes(y = (..count..),label = scales::percent((..count..)/sum(..count..))), stat="bin",colour="darkgreen") 
          q
        }
        perbar(mtcars$disp)
    
    0 讨论(0)
  • 2020-11-22 06:32

    As of March 2017, with ggplot2 2.2.1 I think the best solution is explained in Hadley Wickham's R for data science book:

    ggplot(mydataf) + stat_count(mapping = aes(x=foo, y=..prop.., group=1))
    

    stat_count computes two variables: count is used by default, but you can choose to use prop which shows proportions.

    0 讨论(0)
提交回复
热议问题