apply jittering to outliers data in a boxplot with ggplot2

后端 未结 2 1777
醉酒成梦
醉酒成梦 2021-02-14 07:50

do you have any idea of how to apply jittering just to the outliers data of a boxplot? This is the code:

ggplot(data = a, aes(x = \"\", y = a$V8)) +
geom_boxplot         


        
2条回答
  •  遥遥无期
    2021-02-14 08:05

    This is slightly different approach than above (assigns a color variable with NA for non-outliers), and includes a correction for the upper and lower bounds calculations.

    The default "outlier" definition is a point beyond the 25/75th quartile +/- 1.5 x the interquartile range (IQR).

    Generate some sample data:

    set.seed(1)
    a <- data_frame(x= factor(rep(1:4, each  = 1000)),
                    V8 = c(rnorm(1000, 25, 4), 
                           rnorm(1000, 50, 4),
                           rnorm(1000, 75, 4),
                           rnorm(1000, 100, 4)))
    

    calculate the upper/lower limit outliers (uses dplyr/tidyverse functions):

    library(tidyverse)
    a <- a %>% group_by(x) %>% 
      mutate(outlier.high = V8 > quantile(V8, .75) + 1.50*IQR(V8),
             outlier.low = V8 < quantile(V8, .25) - 1.50*IQR(V8))
    

    Define a color for the upper/lower points:

    a <- a %>% mutate(outlier.color = case_when(outlier.high ~ "red",
                                           outlier.low ~ "steelblue"))
    

    The unclassified cases will be coded as "NA" for color, and will not appear in the plot.

    The dplyr::case_when() function is not completely stable yet (may require github development version > 0.5 at enter link description here), so here is a base alternative if that does not work:

    a$outlier.color <- NA
    a$outlier.color[a$outlier.high] <- "red"
    a$outlier.color[a$outlier.low] <- "steelblue"
    

    Plot:

    a %>% ggplot(aes(x, V8)) + 
      geom_boxplot(outlier.shape = NA)  + 
      geom_jitter(color = a$outlier.color, width = .2) + # NA not plotted 
      theme_bw() + coord_flip()
    

提交回复
热议问题