Ternary plot - scaling opacity across groups

痞子三分冷 提交于 2021-01-29 19:37:29

问题


I am trying to make a contour ternary plot, with two groups, where the opacity (alpha) of the contours is a function of the density of the points (e.g. more points tightly clustered = less opaque).

I am stuck on one point. My two groups (here A and B) have unequal group sizes (A = 150 obs, B = 50 obs), this means points in one group are frequently alot more clustered, when this is the case the opacity of group B should be much higher relative to group A, since points in group B are much less dense. But it looks like opacity is scaled within groups instead of across groups.

My Question: is it possible to scale opacity to the density of points, where density is relative across both groups?

An example:

library(ggtern)

set.seed(1234)

# example data
df <- data.frame(X = c(runif(150, 0.7, 1),runif(50, 0, 0.3)),
                 Y = c(runif(150, 0, 0.3),runif(50, 0, 0.3)),
                 Z = c(runif(150, 0, 0.5),runif(50, 0.5, 1)),
                 D = c(rep("A", 150), rep("B", 50)))


# ternary plot 
ggtern(df, aes(x = X,y = Y, z = Z, color = D)) +
  stat_density_tern(aes(alpha = ..level.., fill = D), 
                    geom = 'polygon', 
                    bins = 10,
                    color = "grey") +
  geom_point(alpha = 0.5) +
  scale_colour_manual(values = c("tomato3", "turquoise4"))

# points are only displayed to show densities, I don't plan on showing 
# points in the final plot

Given group B points are much less dense I would expect the contours to be more opaque than group A.

Another option would be to use scale_colour_gradient(), but I can't see how to get two separate gradients (one for each of A and B) on a single plot.


回答1:


I wished I would have an easier answer for you, but alas, I have not. However, I've found a quite hacky solution to your problem, by making a new stat and predefining breaks. Disclaimer: I don't use ggtern myself, so I don't know much about the specifics. The problem in general seems to be that the density is computed per group and the integral of densities are generally set to 1. We can solve this by adding a new stat that scales this for us.

The solution then seems deceivingly simple: multiply the calculated densities by the number of datapoints in the group, to get a density scaled to reflects group sizes. The only drawback would be that we have to change bins = 10, which is calculated per group, with breaks = seq(start, end, by = somenumber) to have absolute instead of relative breaks for the contours.

However, ggtern is quite the complicated package with it's own peculiarities that make it difficult to write a new stat function to work. There exists a list with 'approved stats' and ggtern will remove any layers that don't have their approval.

ggtern:::.approvedstat
             identity            confidence          density_tern           smooth_tern 
       "StatIdentity"  "StatConfidenceTern"     "StatDensityTern"      "StatSmoothTern" 
                  sum                unique      interpolate_tern          mean_ellipse 
            "StatSum"          "StatUnique" "StatInterpolateTern"     "StatMeanEllipse" 
             hex_tern              tri_tern
        "StatHexTern"         "StatTriTern"

So the first order of business would be to add an entry for our own stat (which we'll call StatDensityTern2) to the approved stat list, but since this .approvedstat is in the package namespace, we'll have to be a bit hacky to do this:

approveupdate <- c(ggtern:::.approvedstat, "density_tern2" = "StatDensityTern2")
assignInNamespace(".approvedstat", approveupdate, pos = "package:ggtern")

Now we can write our own StatDensityTern2, that inherits functionality from StatDensityTern, with a small update in how groups are computed. While writing this new stat, we need to take care that we load necessary packages and refer to internal functions correctly. We'll largely copy-paste from the existing StatDensityTern$compute_group, but make a small adjustment to change z = as.vector(dens$z) to z = as.vector(dens$z) * nrow(data) before passing on the data to the contour function.

library(compositions)
library(rlang)

StatDensityTern2 <-
  ggproto(
    "StatDensityTern2",
    StatDensityTern,
    compute_group = function(
      self, data, scales, na.rm = FALSE, n = 100, h = NULL,
      bdl = 0, bdl.val = NA, contour = TRUE, base = "ilr", expand = 0.5,
      weight = NULL, bins = NULL, binwidth = NULL, breaks = NULL
    ) {
      if (!c(base) %in% c("identity", "ilr")) 
        stop("base must be either identity or ilr", call. = FALSE)
      raes = self$required_aes
      data[raes] = suppressWarnings(compositions::acomp(data[raes]))
      data[raes][data[raes] <= bdl] = bdl.val[1]
      data = remove_missing(data, vars = self$required_aes, na.rm = na.rm, 
                            name = "StatDensityTern", finite = TRUE)
      if (ggplot2:::empty(data)) 
        return(data.frame())
      coord = coord_tern()
      f = get(base, mode = "function")
      fInv = get(sprintf("%sInv", base), mode = "function")
      if (base == "identity") 
        data = tlr2xy(data, coord, inverse = FALSE, scale = TRUE)
      h = h %||% ggtern:::estimateBandwidth(base, data[which(colnames(data) %in% 
                                                      raes)])
      if (length(h) != 2) 
        h = rep(h[1], 2)
      if (base != "identity" && diff(h) != 0) 
        warning("bandwidth 'h' has different x and y bandwiths for 'ilr', this may (probably will) introduce permutational artifacts depending on the ordering", 
                call. = FALSE)
      data[raes[1:2]] = suppressWarnings(f(as.matrix(data[which(colnames(data) %in% 
                                                                  raes)])))
      expand = if (length(expand) != 2) 
        rep(expand[1], 2)
      else expand
      rngxy = range(c(data$x, data$y))
      rngx = scales:::expand_range(switch(base, identity = coord$limits$x, 
                                 rngxy), expand[1])
      rngy = scales:::expand_range(switch(base, identity = coord$limits$y, 
                                 rngxy), expand[2])
      dens = ggtern:::kde2d.weighted(data$x, data$y, h = h, n = n, lims = c(rngx, 
                                                                   rngy), w = data$weight)

# Here be relevant changes ------------------------------------------------


      df = data.frame(expand.grid(x = dens$x, y = dens$y), 
                      z = as.vector(dens$z) * nrow(data), 
                      group = data$group[1])

# Here end relevant changes -----------------------------------------------


      if (contour) {
        df = StatContour$compute_panel(df, scales, bins = bins, 
                                       binwidth = binwidth, breaks = breaks)
      }
      else {
        names(df) <- c("x", "y", "density", "group")
        df$level <- 1
        df$piece <- 1
      }
      if (base == "identity") 
        df = tlr2xy(df, coord, inverse = TRUE, scale = TRUE)
      df[raes] = suppressWarnings(fInv(as.matrix(df[which(colnames(df) %in% 
                                                            raes)])))
      df
    }
  )

Now that we've written a new stat and have approved of the stat ourselves, we can use it in the following manner:

set.seed(1234)

# example data
df <- data.frame(X = c(runif(150, 0.7, 1),runif(50, 0, 0.3)),
                 Y = c(runif(150, 0, 0.3),runif(50, 0, 0.3)),
                 Z = c(runif(150, 0, 0.5),runif(50, 0.5, 1)),
                 D = c(rep("A", 150), rep("B", 50)))

ggtern(df, aes(x = X, y = Y, z = Z, color = D)) +
  geom_polygon(aes(alpha = ..level.., fill = D),
               stat = "DensityTern2",
               breaks = seq(10, 150, by = 10),
               color = "grey") +
  geom_point(alpha = 0.5) +
  scale_colour_manual(values = c("tomato3", "turquoise4"))

Which gave me the following plot:

Hope you found this useful!



来源:https://stackoverflow.com/questions/57175114/ternary-plot-scaling-opacity-across-groups

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!