I have the following data frame
group1 = c(\'a\', \'b\')
group2 = c(\'1\', \'1\', \'2\', \'2\')
mean = 1:4
sd = c(0.2, 0.3, 0.5, 0.8)
df = data.frame(group1, gro
TL;DR: From the start, position = "dodge"
(or position = position_dodge(<some width value>)
) wasn't doing what you thought it was doing.
position_dodge
is one of the position-adjusting functions available in the ggplot2 package. If there are multiple elements belonging to different groups occupying the same location, position_identity
would do nothing at all, position_dodge
would place the elements side by side horizontally, position_stack
would place them on top of one another vertically, position_fill
would place them on top of one another vertically & stretch proportionally to fit the whole plot area, etc.
Here's a summary of different position-adjusting functions' behaviours, from RStudio's ggplot2 cheat sheet:
Note that the elements to be dodged / etc. must belong to different groups. If group = <some variable>
is specified explicitly in a plot, that would be used as the grouping variable for determining which elements should be dodged / etc. from one another. If there's no explicit group mapping in aes()
, but there's one or more of color = <some variable>
/ fill = <some variable>
/ linetype = <some variable>
/ and so on, the interaction of all discrete variables would be used. From ?aes_group_order
:
By default, the group is set to the interaction of all discrete variables in the plot. This often partitions the data correctly, but when it does not, or when no discrete variable is used in the plot, you will need to explicitly define the grouping structure, by mapping group to a variable that has a different value for each group.
Let's start with your original plot. As there was no grouping variable of any kind in the plot's aesthetic mappings, position = "dodge"
did absolutely nothing.
We can replace that with position = "identity"
for both geom layers (in fact, position = "identity"
is the default position for geom_errorbar
, so there's no need to spell it out), and the resulting plot would be the same.
Increasing the transparency makes it obvious that the two bars are occupying the same spot, one "behind" another.
I guess this original plot isn't what you actually intended? There are really very few scenarios where it would make sense for one bar to be behind another like this...
ggplot(data = df, aes(x=group1, y = mean))+
geom_col(position = 'dodge') +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
position = 'dodge') +
ggtitle("original plot")
ggplot(data = df, aes(x=group1, y = mean))+
geom_col(position = "identity") +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd)) +
ggtitle("remove position dodge")
ggplot(data = df, aes(x=group1, y = mean))+
geom_col(position = "identity", alpha = 0.5) +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd)) +
ggtitle("increase transparency")
I'll skip over the second plot, since adding width = 0.2
didn't change anything fundamental.
In the third plot, we finally put position = "dodge"
to use, because there's a group variable now. The bars & errorbars move accordingly, based on their respective widths. This is the expected behaviour if position = "dodge"
is used instead of position = position_dodge(width = <some value>, ...)
, where the distance dodged follows the geom layer's width by default, unless it's overridden by a specific value in position_dodge(width = ...)
.
If the geom_errorbar
layer kept to its default width (which is the same as the default width for geom_col
), both layers' elements would have been dodged by the same amount.
ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
geom_col(position = 'dodge') +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
position = 'dodge') +
ggtitle("third plot")
ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
geom_col(position = 'dodge') +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
position = 'dodge') +
ggtitle("with default width")
Side note: We know both geom_errorbar
& geom_col
have the same default width, because they set up their data in the same way. The following line of code can be found in both GeomErrorbar$setup_data
/ GeomCol$setup_data
:
data$width <- data$width %||% params$width %||% (resolution(data$x, FALSE) * 0.9)
# i.e. if width is specified as one of the aesthetic mappings, use that;
# else if width is specified in the geom layer's parameters, use that;
# else, use 90% of the dataset's x-axis variable's resolution. <- default value of 0.9
In conclusion, when you have different aesthetic groups, specifying the width in position_dodge
determines the distance moved by each element, while specifying the width in each geom layer's determines each element's... well, width. As long as different geom layers dodge by the same amount, they will be in alignment with one another.
Below is a random example for illustration, which uses different width values for each layer (0.5 for geom_col
, 0.9 for geom_errorbar
), but the same dodge width (0.6):
ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
geom_col(position = position_dodge(0.6), width = 0.5) +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.9,
position = position_dodge(0.6)) +
ggtitle("another example")