Follow up to stat_contour_2d bins - interpretation

寵の児 提交于 2021-01-27 15:05:06

问题


This is a direct follow up to How to interpret ggplot2::stat_density2d.

bins has been re-added as an argument see this thread and the corresponding github issue, but it remains a mistery to me how to interpret those bins.

This answer (answer 1) suggests a way to calculate contour lines based on probabilities, and this answer argues that the current use of kde2d in stat_density_2d would not mean that the bins can be interpreted as percentiles.

So the question. When trying both approaches in order to get estimate quintile probabilities of the data, I get four lines as expected using the approach from answer 1, but only three lines with bins = 5 in stat_density_2d. (which, as I believe, would give 4 bins!)

The fifth bin could be this tiny little dot in the centre which appears (maybe the centroid??)???

Is one of the ways utterly wrong? Or both? Or just two ways of estimating probabilities with their very own imprecision?

library(ggplot2)

#modifying function from answer1
prob_contour <- function(data, n = 50, prob = 0.95, ...) {

  post1 <- MASS::kde2d(data[[1]], data[[2]], n = n, ...)

  dx <- diff(post1$x[1:2])
  dy <- diff(post1$y[1:2])
  sz <- sort(post1$z)
  c1 <- cumsum(sz) * dx * dy

  levels <- sapply(prob, function(x) {
    approx(c1, sz, xout = 1 - x)$y
  })

  df <- as.data.frame(grDevices::contourLines(post1$x, post1$y, post1$z, levels = levels))
  df$x <- round(df$x, 3)
  df$y <- round(df$y, 3)
  df$level <- round(df$level, 2)
  df$prob <- as.character(prob)

  df
}

set.seed(1)
n=100
foo <- data.frame(x=rnorm(n, 0, 1), y=rnorm(n, 0, 1))

df_contours <- dplyr::bind_rows(
  purrr::map(seq(0.2, 0.8, 0.2), function(p) prob_contour(foo, prob = p))
)

ggplot() +
  stat_density_2d(data = foo, aes(x, y), bins = 5, color = "black") +
  geom_point(data = foo, aes(x = x, y = y)) +
  geom_polygon(data = df_contours, aes(x = x, y = y, color = prob), fill = NA) +
  scale_color_brewer(name = "Probs", palette = "Set1")

Created on 2020-05-15 by the reprex package (v0.3.0)

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       macOS Catalina 10.15.4      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2020-05-15                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version  date       lib source        
#>  assertthat     0.2.1    2019-03-21 [1] CRAN (R 4.0.0)
#>  backports      1.1.7    2020-05-13 [1] CRAN (R 4.0.0)
#>  callr          3.4.3    2020-03-28 [1] CRAN (R 4.0.0)
#>  cli            2.0.2    2020-02-28 [1] CRAN (R 4.0.0)
#>  colorspace     1.4-1    2019-03-18 [1] CRAN (R 4.0.0)
#>  crayon         1.3.4    2017-09-16 [1] CRAN (R 4.0.0)
#>  curl           4.3      2019-12-02 [1] CRAN (R 4.0.0)
#>  desc           1.2.0    2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools       2.3.0    2020-04-10 [1] CRAN (R 4.0.0)
#>  digest         0.6.25   2020-02-23 [1] CRAN (R 4.0.0)
#>  dplyr          0.8.5    2020-03-07 [1] CRAN (R 4.0.0)
#>  ellipsis       0.3.0    2019-09-20 [1] CRAN (R 4.0.0)
#>  evaluate       0.14     2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi          0.4.1    2020-01-08 [1] CRAN (R 4.0.0)
#>  farver         2.0.3    2020-01-16 [1] CRAN (R 4.0.0)
#>  fs             1.4.1    2020-04-04 [1] CRAN (R 4.0.0)
#>  ggplot2      * 3.3.0    2020-03-05 [1] CRAN (R 4.0.0)
#>  glue           1.4.1    2020-05-13 [1] CRAN (R 4.0.0)
#>  gtable         0.3.0    2019-03-25 [1] CRAN (R 4.0.0)
#>  highr          0.8      2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools      0.4.0    2019-10-04 [1] CRAN (R 4.0.0)
#>  httr           1.4.1    2019-08-05 [1] CRAN (R 4.0.0)
#>  isoband        0.2.1    2020-04-12 [1] CRAN (R 4.0.0)
#>  knitr          1.28     2020-02-06 [1] CRAN (R 4.0.0)
#>  labeling       0.3      2014-08-23 [1] CRAN (R 4.0.0)
#>  lifecycle      0.2.0    2020-03-06 [1] CRAN (R 4.0.0)
#>  magrittr       1.5      2014-11-22 [1] CRAN (R 4.0.0)
#>  MASS           7.3-51.5 2019-12-20 [1] CRAN (R 4.0.0)
#>  memoise        1.1.0    2017-04-21 [1] CRAN (R 4.0.0)
#>  mime           0.9      2020-02-04 [1] CRAN (R 4.0.0)
#>  munsell        0.5.0    2018-06-12 [1] CRAN (R 4.0.0)
#>  pillar         1.4.4    2020-05-05 [1] CRAN (R 4.0.0)
#>  pkgbuild       1.0.8    2020-05-07 [1] CRAN (R 4.0.0)
#>  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 4.0.0)
#>  pkgload        1.0.2    2018-10-29 [1] CRAN (R 4.0.0)
#>  prettyunits    1.1.1    2020-01-24 [1] CRAN (R 4.0.0)
#>  processx       3.4.2    2020-02-09 [1] CRAN (R 4.0.0)
#>  ps             1.3.3    2020-05-08 [1] CRAN (R 4.0.0)
#>  purrr          0.3.4    2020-04-17 [1] CRAN (R 4.0.0)
#>  R6             2.4.1    2019-11-12 [1] CRAN (R 4.0.0)
#>  RColorBrewer   1.1-2    2014-12-07 [1] CRAN (R 4.0.0)
#>  Rcpp           1.0.4.6  2020-04-09 [1] CRAN (R 4.0.0)
#>  remotes        2.1.1    2020-02-15 [1] CRAN (R 4.0.0)
#>  rlang          0.4.6    2020-05-02 [1] CRAN (R 4.0.0)
#>  rmarkdown      2.1      2020-01-20 [1] CRAN (R 4.0.0)
#>  rprojroot      1.3-2    2018-01-03 [1] CRAN (R 4.0.0)
#>  scales         1.1.1    2020-05-11 [1] CRAN (R 4.0.0)
#>  sessioninfo    1.1.1    2018-11-05 [1] CRAN (R 4.0.0)
#>  stringi        1.4.6    2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr        1.4.0    2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat       2.3.2    2020-03-02 [1] CRAN (R 4.0.0)
#>  tibble         3.0.1    2020-04-20 [1] CRAN (R 4.0.0)
#>  tidyselect     1.1.0    2020-05-11 [1] CRAN (R 4.0.0)
#>  usethis        1.6.1    2020-04-29 [1] CRAN (R 4.0.0)
#>  vctrs          0.3.0    2020-05-11 [1] CRAN (R 4.0.0)
#>  withr          2.2.0    2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun           0.13     2020-04-13 [1] CRAN (R 4.0.0)
#>  xml2           1.3.2    2020-04-23 [1] CRAN (R 4.0.0)
#>  yaml           2.2.1    2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

(Despite all the mystery, it is somewhat reassuring that the contours look vaguely similar)


回答1:


I'm not sure this fully answers your question, but there has been a change in behaviour between ggplot v3.2.1 and v3.3.0 due to the way the contour bins are calculated. In the earlier version, the bins are calculated in StatContour$compute_group, whereas in the later version, StatContour$compute_group delegates this task to the unexported function contour_breaks. In contour_breaks, the bin widths are calculated by the density range divided by bins - 1, whereas in the earlier version they are calculated by the range divided by bins.

We can revert this behaviour by temporarily changing the contour_breaks function:


Before

ggplot() +
  stat_density_2d(data = foo, aes(x, y), bins = 5, color = "black") +
  geom_point(data = foo, aes(x = x, y = y)) +
  geom_polygon(data = df_contours, aes(x = x, y = y, color = prob), fill = NA) +
  scale_color_brewer(name = "Probs", palette = "Set1")

Now change the divisor in contour_breaks from bins - 1 to bins:

my_fun <- ggplot2:::contour_breaks
body(my_fun)[[4]][[3]][[2]][[3]][[3]] <- quote(bins)
assignInNamespace("contour_breaks", my_fun, ns = "ggplot2", pos = "package:ggplot2")

After

Using exactly the same code as produced the first plot:

ggplot() +
  stat_density_2d(data = foo, aes(x, y), bins = 5, color = "black") +
  geom_point(data = foo, aes(x = x, y = y)) +
  geom_polygon(data = df_contours, aes(x = x, y = y, color = prob), fill = NA) +
  scale_color_brewer(name = "Probs", palette = "Set1")



来源:https://stackoverflow.com/questions/61817440/follow-up-to-stat-contour-2d-bins-interpretation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!