force boxplots from geom_boxplot to constant width

后端 未结 2 1297
北荒
北荒 2021-02-19 04:01

I\'m making a boxplot in which x and fill are mapped to different variables, a bit like this:

ggplot(mpg, aes(x=as.factor(cyl), y=cty,          


        
2条回答
  •  [愿得一人]
    2021-02-19 04:57

    The problem is due to some cells of factor combinations being not present. The number of data points for all combinations of the levels of cyl and drv can be checked via xtabs:

    tab <- xtabs( ~ drv + cyl, mpg)
    
    tab
    
    #    cyl
    # drv  4  5  6  8
    #   4 23  0 32 48
    #   f 58  4 43  1
    #   r  0  0  4 21
    

    There are three empty cells. I will add fake data to override the visualization problems.

    Check the range of the dependent variable (y-axis). The fake data needs to be out of this range.

    range(mpg$cty)
    # [1]  9 35
    

    Create a subset of mpg with the data needed for the plot:

    tmp <- mpg[c("cyl", "drv", "cty")]
    

    Create an index for the empty cells:

    idx <- which(tab == 0, arr.ind = TRUE)
    
    idx
    
    #   row col
    # r   3   1
    # 4   1   2
    # r   3   2
    

    Create three fake lines (with -1 as value for cty):

    fakeLines <- apply(idx, 1,
                       function(x) 
                         setNames(data.frame(as.integer(dimnames(tab)[[2]][x[2]]), 
                                             dimnames(tab)[[1]][x[1]], 
                                             -1), 
                                  names(tmp)))
    
    fakeLines
    
    # $r
    #   cyl drv cty
    # 1   4   r  -1
    # 
    # $`4`
    #   cyl drv cty
    # 1   5   4  -1
    # 
    # $r
    #   cyl drv cty
    # 1   5   r  -1
    

    Add the rows to the existing data:

    tmp2 <- rbind(tmp, do.call(rbind, fakeLines))
    

    Plot:

    library(ggplot2)
    ggplot(tmp2, aes(x = as.factor(cyl), y = cty, fill = as.factor(drv))) + 
      geom_boxplot() +
      coord_cartesian(ylim = c(min(tmp$cty - 3), max(tmp$cty) + 3))
      # The axis limits have to be changed to suppress displaying the fake data.
    

    enter image description here

提交回复
热议问题