Setting hex bins in ggplot2 to same size

半世苍凉 提交于 2019-11-27 16:01:03

问题


I'm trying to make a hexbin representation of data in several categories. The problem is, facetting these bins seems to make all of them different sizes.

set.seed(1) #Create data
bindata <- data.frame(x=rnorm(100), y=rnorm(100))
fac_probs <- dnorm(seq(-3, 3, length.out=26))
fac_probs <- fac_probs/sum(fac_probs)
bindata$factor <- sample(letters, 100, replace=TRUE, prob=fac_probs)

library(ggplot2) #Actual plotting
library(hexbin)

ggplot(bindata, aes(x=x, y=y)) +
  geom_hex() +
  facet_wrap(~factor)

Is it possible to set something to make all these bins physically the same size?


回答1:


As Julius says, the problem is that hexGrob doesn't get the information about the bin sizes, and guesses it from the differences it finds within the facet.

Obviously, it would make sense to hand dx and dy to a hexGrob -- not having the width and height of a hexagon is like specifying a circle by center without giving the radius.

Workaround:

The resolution strategy works, if the facet contains two adjacent haxagons that differ in both x and y. So, as a workaround, I'll construct manually a data.frame containing the x and y center coordinates of the cells, and the factor for facetting and the counts:

In addition to the libraries specified in the question, I'll need

library (reshape2)

and also bindata$factor actually needs to be a factor:

bindata$factor <- as.factor (bindata$factor)

Now, calculate the basic hexagon grid

h <- hexbin (bindata, xbins = 5, IDs = TRUE, 
             xbnds = range (bindata$x), 
             ybnds = range (bindata$y))

Next, we need to calculate the counts depending on bindata$factor

counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts)  <- c ("ID", "factor", "counts")

As we have the cell IDs, we can merge this data.frame with the proper coordinates:

hexdf <- data.frame (hcell2xy (h),  ID = h@cell)
hexdf <- merge (counts, hexdf)

Here's what the data.frame looks like:

> head (hexdf)
  ID factor counts          x         y
1  3      e      0 -0.3681728 -1.914359
2  3      s      0 -0.3681728 -1.914359
3  3      y      0 -0.3681728 -1.914359
4  3      r      0 -0.3681728 -1.914359
5  3      p      0 -0.3681728 -1.914359
6  3      o      0 -0.3681728 -1.914359

ggplotting (use the command below) this yields the correct bin sizes, but the figure has a bit weird appearance: 0 count hexagons are drawn, but only where some other facet has this bin populated. To suppres the drawing, we can set the counts there to NA and make the na.value completely transparent (it defaults to grey50):

hexdf$counts [hexdf$counts == 0] <- NA

ggplot(hexdf, aes(x=x, y=y, fill = counts)) +
  geom_hex(stat="identity") +
  facet_wrap(~factor) +
  coord_equal () +
  scale_fill_continuous (low = "grey80", high = "#000040", na.value = "#00000000")

yields the figure at the top of the post.

This strategy works as long as the binwidths are correct without facetting. If the binwidths are set very small, the resolution may still yield too large dx and dy. In that case, we can supply hexGrob with two adjacent bins (but differing in both x and y) with NA counts for each facet.

dummy <- hgridcent (xbins = 5, 
                    xbnds = range (bindata$x),  
                    ybnds = range (bindata$y),  
                    shape = 1)

dummy <- data.frame (ID = 0,
                     factor = rep (levels (bindata$factor), each = 2),
                     counts = NA,
                     x = rep (dummy$x [1] + c (0, dummy$dx/2), 
                              nlevels (bindata$factor)),
                     y = rep (dummy$y [1] + c (0, dummy$dy  ), 
                              nlevels (bindata$factor)))

An additional advantage of this approach is that we can delete all the rows with 0 counts already in counts, in this case reducing the size of hexdf by roughly 3/4 (122 rows instead of 520):

counts <- counts [counts$counts > 0 ,]
hexdf <- data.frame (hcell2xy (h),  ID = h@cell)
hexdf <- merge (counts, hexdf)
hexdf <- rbind (hexdf, dummy)

The plot looks exactly the same as above, but you can visualize the difference with na.value not being fully transparent.


more about the problem

The problem is not unique to facetting but occurs always if too few bins are occupied, so that no "diagonally" adjacent bins are populated.

Here's a series of more minimal data that shows the problem:

First, I trace hexBin so I get all center coordinates of the same hexagonal grid that ggplot2:::hexBin and the object returned by hexbin:

trace (ggplot2:::hexBin, exit = quote ({trace.grid <<- as.data.frame (hgridcent (xbins = xbins, xbnds = xbnds, ybnds = ybnds, shape = ybins/xbins) [1:2]); trace.h <<- hb}))

Set up a very small data set:

df <- data.frame (x = 3 : 1, y = 1 : 3)

And plot:

p <- ggplot(df, aes(x=x, y=y)) +  geom_hex(binwidth=c(1, 1)) +          
     coord_fixed (xlim = c (0, 4), ylim = c (0,4))

p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) + 
    geom_point (data = df, col = "red") # data pts

str (trace.h)

Formal class 'hexbin' [package "hexbin"] with 16 slots
  ..@ cell  : int [1:3] 3 5 7
  ..@ count : int [1:3] 1 1 1
  ..@ xcm   : num [1:3] 3 2 1
  ..@ ycm   : num [1:3] 1 2 3
  ..@ xbins : num 2
  ..@ shape : num 1
  ..@ xbnds : num [1:2] 1 3
  ..@ ybnds : num [1:2] 1 3
  ..@ dimen : num [1:2] 4 3
  ..@ n     : int 3
  ..@ ncells: int 3
  ..@ call  : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
  ..@ xlab  : chr "x"
  ..@ ylab  : chr "y"
  ..@ cID   : NULL
  ..@ cAtt  : int(0) 

I repeat the plot, leaving out data point 2:

p <- ggplot(df [-2,], aes(x=x, y=y)) +  geom_hex(binwidth=c(1, 1)) +          coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p
p + geom_point (data = trace.grid, size = 4) + geom_point (data = df, col = "red")
str (trace.h)

Formal class 'hexbin' [package "hexbin"] with 16 slots
  ..@ cell  : int [1:2] 3 7
  ..@ count : int [1:2] 1 1
  ..@ xcm   : num [1:2] 3 1
  ..@ ycm   : num [1:2] 1 3
  ..@ xbins : num 2
  ..@ shape : num 1
  ..@ xbnds : num [1:2] 1 3
  ..@ ybnds : num [1:2] 1 3
  ..@ dimen : num [1:2] 4 3
  ..@ n     : int 2
  ..@ ncells: int 2
  ..@ call  : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
  ..@ xlab  : chr "x"
  ..@ ylab  : chr "y"
  ..@ cID   : NULL
  ..@ cAtt  : int(0) 

  • note that the results from hexbin are on the same grid (cell numbers did not change, just cell 5 is not populated any more and thus not listed), grid dimensions and ranges did not change. But the plotted hexagons did change dramatically.

  • Also notice that hgridcent forgets to return the center coordinates of the first cell (lower left).

Though it gets populated:

df <- data.frame (x = 1 : 3, y = 1 : 3)

p <- ggplot(df, aes(x=x, y=y)) +  geom_hex(binwidth=c(0.5, 0.8)) +          
     coord_fixed (xlim = c (0, 4), ylim = c (0,4))

p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) + 
    geom_point (data = df, col = "red") + # data pts
    geom_point (data = as.data.frame (hcell2xy (trace.h)), shape = 1, size = 6)

Here, the rendering of the hexagons cannot possibly be correct - they do not belong to one hexagonal grid.




回答2:


I tried to replicate your solution with the same data set using lattice hexbinplot. Initially, it gave me an error xbnds[1] < xbnds[2] is not fulfilled. This error was due to wrong numeric vectors specifying range of values that should be covered by the binning. I changed those arguments in hexbinplot, and it somehow worked. Not sure if it helps you to solve it with ggplot, but it's probably some starting point.

library(lattice)
library(hexbin)
hexbinplot(y ~ x | factor, bindata, xbnds = "panel", ybnds = "panel", xbins=5, 
           layout=c(7,3))

EDIT

Although rectangular bins with stat_bin2d() work just fine:

ggplot(bindata, aes(x=x, y=y, group=factor)) + 
    facet_wrap(~factor) +
    stat_bin2d(binwidth=c(0.6, 0.6))




回答3:


There are two source files that we are interested in: stat-binhex.r and geom-hex.r, mainly hexBin and hexGrob functions.

As @Dinre mentioned, this issue is not really related to faceting. What we can see is that binwidth is not ignored and is used in a special way in hexBin, this function is applied for every facet separately. After that, hexGrob is applied for every facet. To be sure you can inspect them with e.g.

trace(ggplot2:::hexGrob, quote(browser()))
trace(ggplot2:::hexBin, quote(browser()))

Hence this explains why sizes differ - they depend on both binwidth and the data of each facet itself.

It is difficult to keep track of the process because of various coordinates transforms, but notice that the output of hexBin

data.frame(
  hcell2xy(hb),
  count = hb@count,
  density = hb@count / sum(hb@count, na.rm=TRUE)
)

always seems to look quite ordinary and that hexGrob is responsible for drawing hex bins, distortion, i.e. it has polygonGrob. In case when there is only one hex bin in a facet there is a more serious anomaly.

dx <- resolution(x, FALSE)
dy <- resolution(y, FALSE) / sqrt(3) / 2 * 1.15

in ?resolution we can see

Description

 The resolution is is the smallest non-zero distance between adjacent
 values. If there is only one unique value, then the resolution is
 defined to be one.

for this reason (resolution(x, FALSE) == 1 and resolution(y, FALSE) == 1) the x coordinates of polygonGrob of the first facet in your example are

[1] 1.5native  1.5native  0.5native  -0.5native -0.5native 0.5native 

and if I am not wrong, in this case native units are like npc, so they should be between 0 and 1. That is, in case of single hex bin it goes out of range because of resolution(). This function also is the reason of distortion that @Dinre mentioned even when having up to several hex bins.

So for now there does not seem to be an option to have hex bins of equal size. A temporal (and very inconvenient for a large number of factors) solution could begin with something like this:

library(gridExtra)
set.seed(2)
bindata <- data.frame(x = rnorm(100), y = rnorm(100))
fac_probs <- c(10, 40, 40, 10)
bindata$factor <- sample(letters[1:4], 100, 
                         replace = TRUE, prob = fac_probs)

binwidths <- list(c(0.4, 0.4), c(0.5, 0.5),
                  c(0.5, 0.5), c(0.4, 0.4))

plots <- mapply(function(w,z){
  ggplot(bindata[bindata$factor == w, ], aes(x = x, y = y)) +
    geom_hex(binwidth = z) + theme(legend.position = 'none')
}, letters[1:4], binwidths, SIMPLIFY = FALSE)

do.call(grid.arrange, plots)




回答4:


I also did some fiddling around with the hex plots in 'ggplot2', and I was able to consistently produce significant bin distortion when a factor's population was reduced to 8 or below. I can't explain why this is happening without digging down into the package source (which I am reluctant to do), but I can tell you that sparse factors seem to consistently wreck the hex bin plotting in 'ggplot2'.

This suggests to me that the size and shape of a particular hex bin in 'ggplot2' is related to a calculation that is unique to each facet, instead of doing a single calculation for the group and plotting the data afterwards. This is somewhat reinforced by the fact that I can reproduce the distortion in any given facet by plotting only that single factor, like so:

ggplot(bindata[bindata$factor=="e",], aes(x=x, y=y)) +
geom_hex()

This feels like something that should be elevated to the package maintainer, Hadley Wickham (h.wickham at gmail.com). This info is publicly available from CRAN.

Update: I sent an email to the Hadley Wickham asking if he would take a look at this question, and he confirmed that this behavior is indeed a bug.



来源:https://stackoverflow.com/questions/14495111/setting-hex-bins-in-ggplot2-to-same-size

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!