do you have any idea of how to apply jittering just to the outliers data of a boxplot? This is the code:
ggplot(data = a, aes(x = \"\", y = a$V8)) +
geom_boxplot
This is slightly different approach than above (assigns a color variable with NA for non-outliers), and includes a correction for the upper and lower bounds calculations.
The default "outlier" definition is a point beyond the 25/75th quartile +/- 1.5 x the interquartile range (IQR).
Generate some sample data:
set.seed(1)
a <- data_frame(x= factor(rep(1:4, each = 1000)),
V8 = c(rnorm(1000, 25, 4),
rnorm(1000, 50, 4),
rnorm(1000, 75, 4),
rnorm(1000, 100, 4)))
calculate the upper/lower limit outliers (uses dplyr/tidyverse functions):
library(tidyverse)
a <- a %>% group_by(x) %>%
mutate(outlier.high = V8 > quantile(V8, .75) + 1.50*IQR(V8),
outlier.low = V8 < quantile(V8, .25) - 1.50*IQR(V8))
Define a color for the upper/lower points:
a <- a %>% mutate(outlier.color = case_when(outlier.high ~ "red",
outlier.low ~ "steelblue"))
The unclassified cases will be coded as "NA" for color, and will not appear in the plot.
The dplyr::case_when()
function is not completely stable yet (may require github development version > 0.5 at enter link description here), so here is a base alternative if that does not work:
a$outlier.color <- NA
a$outlier.color[a$outlier.high] <- "red"
a$outlier.color[a$outlier.low] <- "steelblue"
Plot:
a %>% ggplot(aes(x, V8)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(color = a$outlier.color, width = .2) + # NA not plotted
theme_bw() + coord_flip()
Added a vector to your data set to indicate which points are and are not outliers. Then, Set the geom_boxplot
to not plot any outliers and use a geom_point
to plot the outliers explicity.
I will use the diamonds
data set from ggplot2
to illustrate.
library(ggplot2)
library(dplyr)
diamonds2 <-
diamonds %>%
group_by(cut) %>%
mutate(outlier = price > median(price) + IQR(price) * 1.5) %>%
ungroup
ggplot(diamonds2) +
aes(x = cut, y = price) +
geom_boxplot(outlier.shape = NA) + # NO OUTLIERS
geom_point(data = function(x) dplyr::filter_(x, ~ outlier), position = 'jitter') # Outliers