So, I have a fairly large dataset (Dropbox: csv file) that I\'m trying to plot using geom_boxplot
. The following produces what appears to be a reasonable plot:<
The solution to this question is in the application of scale_y_continuous
. ggplot2 will perform operations in the following order:
In this case, because a scale transformation is invoked, ggplot2 excludes data outside the scale limits for the statistical computation of the boxplot hinges. The medians calculated by the aggregate
function and used in the geom_text
instruction will use the entire dataset, however. This can result in different median hinges and text labels.
The solution is to omit the scale_y_continuous
instruction and instead use:
d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) +
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))
This allows ggplot2 to calculate the boxplot hinge stats using the entire dataset, while limiting the plot size of the figure.