I am doing some analysis in ggplot2 at the moment for a project and by chance I stumbled across some (for me) weird behavior that I cannot explain. When I write aes(x
tl;dr
Never use [
or $
inside aes()
.
Consider this illustrative example where the facetting variable f
is purposely in a non-obvious order with respect to x
d <- data.frame(x=1:10, f=rev(letters[gl(2,5)]))
Now contrast what happens with these two plots,
p1 <- ggplot(d) +
facet_grid(.~f, labeller = label_both) +
geom_text(aes(x, y=0, label=x, colour=f)) +
ggtitle("good mapping")
p2 <- ggplot(d) +
facet_grid(.~f, labeller = label_both) +
geom_text(aes(d$x, y=0, label=x, colour=f)) +
ggtitle("$ corruption")
We can get a better idea of what's happening by looking at the data.frame created internally by ggplot2 for each panel,
ggplot_build(p1)[["data"]][[1]][,c("x","PANEL")]
x PANEL
1 6 1
2 7 1
3 8 1
4 9 1
5 10 1
6 1 2
7 2 2
8 3 2
9 4 2
10 5 2
ggplot_build(p2)[["data"]][[1]][,c("x", "PANEL")]
x PANEL
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 2
7 7 2
8 8 2
9 9 2
10 10 2
The second plot has the wrong mapping, because when ggplot creates a data.frame for each panel, it picks x values in the "wrong" order.
This occurs because the use of $
breaks the link between the various variables to be mapped (ggplot must assume it's an independent variable, which for all it knows could come from an arbitrary, disconnected source). Since the data.frame in this example is not ordered according to the factor f
, the subset data.frames used internally for each panel assume the wrong order.