Position problem with geom_bar when using both width and dodge

后端 未结 1 1005
借酒劲吻你
借酒劲吻你 2021-01-25 05:43

I have the following data frame

group1 = c(\'a\', \'b\')
group2 = c(\'1\', \'1\', \'2\', \'2\')
mean = 1:4
sd = c(0.2, 0.3, 0.5, 0.8)
df = data.frame(group1, gro         


        
1条回答
  •  -上瘾入骨i
    2021-01-25 05:50

    TL;DR: From the start, position = "dodge" (or position = position_dodge()) wasn't doing what you thought it was doing.

    Underlying intuition

    position_dodge is one of the position-adjusting functions available in the ggplot2 package. If there are multiple elements belonging to different groups occupying the same location, position_identity would do nothing at all, position_dodge would place the elements side by side horizontally, position_stack would place them on top of one another vertically, position_fill would place them on top of one another vertically & stretch proportionally to fit the whole plot area, etc.

    Here's a summary of different position-adjusting functions' behaviours, from RStudio's ggplot2 cheat sheet:

    Note that the elements to be dodged / etc. must belong to different groups. If group = is specified explicitly in a plot, that would be used as the grouping variable for determining which elements should be dodged / etc. from one another. If there's no explicit group mapping in aes(), but there's one or more of color = / fill = / linetype = / and so on, the interaction of all discrete variables would be used. From ?aes_group_order:

    By default, the group is set to the interaction of all discrete variables in the plot. This often partitions the data correctly, but when it does not, or when no discrete variable is used in the plot, you will need to explicitly define the grouping structure, by mapping group to a variable that has a different value for each group.

    Plot by plot breakdown

    Let's start with your original plot. As there was no grouping variable of any kind in the plot's aesthetic mappings, position = "dodge" did absolutely nothing.

    We can replace that with position = "identity" for both geom layers (in fact, position = "identity" is the default position for geom_errorbar, so there's no need to spell it out), and the resulting plot would be the same.

    Increasing the transparency makes it obvious that the two bars are occupying the same spot, one "behind" another.

    I guess this original plot isn't what you actually intended? There are really very few scenarios where it would make sense for one bar to be behind another like this...

    ggplot(data = df, aes(x=group1, y = mean))+
      geom_col(position = 'dodge') + 
      geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
                    position = 'dodge') +
      ggtitle("original plot")
    
    ggplot(data = df, aes(x=group1, y = mean))+
      geom_col(position = "identity") + 
      geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd)) +
      ggtitle("remove position dodge")
    
    ggplot(data = df, aes(x=group1, y = mean))+
      geom_col(position = "identity", alpha = 0.5) + 
      geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd)) +
      ggtitle("increase transparency")
    

    I'll skip over the second plot, since adding width = 0.2 didn't change anything fundamental.

    In the third plot, we finally put position = "dodge" to use, because there's a group variable now. The bars & errorbars move accordingly, based on their respective widths. This is the expected behaviour if position = "dodge" is used instead of position = position_dodge(width = , ...), where the distance dodged follows the geom layer's width by default, unless it's overridden by a specific value in position_dodge(width = ...).

    If the geom_errorbar layer kept to its default width (which is the same as the default width for geom_col), both layers' elements would have been dodged by the same amount.

    ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
      geom_col(position = 'dodge') + 
      geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                    position = 'dodge') +
      ggtitle("third plot")
    
    ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
      geom_col(position = 'dodge') + 
      geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), 
                    position = 'dodge') +
      ggtitle("with default width")
    

    Side note: We know both geom_errorbar & geom_col have the same default width, because they set up their data in the same way. The following line of code can be found in both GeomErrorbar$setup_data / GeomCol$setup_data:

    data$width <- data$width %||% params$width %||% (resolution(data$x, FALSE) * 0.9)
    # i.e. if width is specified as one of the aesthetic mappings, use that;
    #      else if width is specified in the geom layer's parameters, use that;
    #      else, use 90% of the dataset's x-axis variable's resolution.        <- default value of 0.9
    

    In conclusion, when you have different aesthetic groups, specifying the width in position_dodge determines the distance moved by each element, while specifying the width in each geom layer's determines each element's... well, width. As long as different geom layers dodge by the same amount, they will be in alignment with one another.

    Below is a random example for illustration, which uses different width values for each layer (0.5 for geom_col, 0.9 for geom_errorbar), but the same dodge width (0.6):

    ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
      geom_col(position = position_dodge(0.6), width = 0.5) + 
      geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.9,
                    position = position_dodge(0.6)) +
      ggtitle("another example")
    

    0 讨论(0)
提交回复
热议问题