问题
I'm trying to make a hexbin representation of data in several categories. The problem is, facetting these bins seems to make all of them different sizes.
set.seed(1) #Create data
bindata <- data.frame(x=rnorm(100), y=rnorm(100))
fac_probs <- dnorm(seq(-3, 3, length.out=26))
fac_probs <- fac_probs/sum(fac_probs)
bindata$factor <- sample(letters, 100, replace=TRUE, prob=fac_probs)
library(ggplot2) #Actual plotting
library(hexbin)
ggplot(bindata, aes(x=x, y=y)) +
geom_hex() +
facet_wrap(~factor)
Is it possible to set something to make all these bins physically the same size?
回答1:
As Julius says, the problem is that hexGrob
doesn't get the information about the bin sizes, and guesses it from the differences it finds within the facet.
Obviously, it would make sense to hand dx
and dy
to a hexGrob
-- not having the width and height of a hexagon is like specifying a circle by center without giving the radius.
Workaround:
The resolution
strategy works, if the facet contains two adjacent haxagons that differ in both x and y. So, as a workaround, I'll construct manually a data.frame containing the x and y center coordinates of the cells, and the factor for facetting and the counts:
In addition to the libraries specified in the question, I'll need
library (reshape2)
and also bindata$factor
actually needs to be a factor:
bindata$factor <- as.factor (bindata$factor)
Now, calculate the basic hexagon grid
h <- hexbin (bindata, xbins = 5, IDs = TRUE,
xbnds = range (bindata$x),
ybnds = range (bindata$y))
Next, we need to calculate the counts depending on bindata$factor
counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts) <- c ("ID", "factor", "counts")
As we have the cell IDs, we can merge this data.frame with the proper coordinates:
hexdf <- data.frame (hcell2xy (h), ID = h@cell)
hexdf <- merge (counts, hexdf)
Here's what the data.frame looks like:
> head (hexdf)
ID factor counts x y
1 3 e 0 -0.3681728 -1.914359
2 3 s 0 -0.3681728 -1.914359
3 3 y 0 -0.3681728 -1.914359
4 3 r 0 -0.3681728 -1.914359
5 3 p 0 -0.3681728 -1.914359
6 3 o 0 -0.3681728 -1.914359
ggplot
ting (use the command below) this yields the correct bin sizes, but the figure has a bit weird appearance: 0 count hexagons are drawn, but only where some other facet has this bin populated. To suppres the drawing, we can set the counts there to NA
and make the na.value
completely transparent (it defaults to grey50):
hexdf$counts [hexdf$counts == 0] <- NA
ggplot(hexdf, aes(x=x, y=y, fill = counts)) +
geom_hex(stat="identity") +
facet_wrap(~factor) +
coord_equal () +
scale_fill_continuous (low = "grey80", high = "#000040", na.value = "#00000000")
yields the figure at the top of the post.
This strategy works as long as the binwidths are correct without facetting. If the binwidths are set very small, the resolution
may still yield too large dx
and dy
. In that case, we can supply hexGrob
with two adjacent bins (but differing in both x and y) with NA
counts for each facet.
dummy <- hgridcent (xbins = 5,
xbnds = range (bindata$x),
ybnds = range (bindata$y),
shape = 1)
dummy <- data.frame (ID = 0,
factor = rep (levels (bindata$factor), each = 2),
counts = NA,
x = rep (dummy$x [1] + c (0, dummy$dx/2),
nlevels (bindata$factor)),
y = rep (dummy$y [1] + c (0, dummy$dy ),
nlevels (bindata$factor)))
An additional advantage of this approach is that we can delete all the rows with 0 counts already in counts
, in this case reducing the size of hexdf
by roughly 3/4 (122 rows instead of 520):
counts <- counts [counts$counts > 0 ,]
hexdf <- data.frame (hcell2xy (h), ID = h@cell)
hexdf <- merge (counts, hexdf)
hexdf <- rbind (hexdf, dummy)
The plot looks exactly the same as above, but you can visualize the difference with na.value
not being fully transparent.
more about the problem
The problem is not unique to facetting but occurs always if too few bins are occupied, so that no "diagonally" adjacent bins are populated.
Here's a series of more minimal data that shows the problem:
First, I trace hexBin
so I get all center coordinates of the same hexagonal grid that ggplot2:::hexBin
and the object returned by hexbin
:
trace (ggplot2:::hexBin, exit = quote ({trace.grid <<- as.data.frame (hgridcent (xbins = xbins, xbnds = xbnds, ybnds = ybnds, shape = ybins/xbins) [1:2]); trace.h <<- hb}))
Set up a very small data set:
df <- data.frame (x = 3 : 1, y = 1 : 3)
And plot:
p <- ggplot(df, aes(x=x, y=y)) + geom_hex(binwidth=c(1, 1)) +
coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) +
geom_point (data = df, col = "red") # data pts
str (trace.h)
Formal class 'hexbin' [package "hexbin"] with 16 slots
..@ cell : int [1:3] 3 5 7
..@ count : int [1:3] 1 1 1
..@ xcm : num [1:3] 3 2 1
..@ ycm : num [1:3] 1 2 3
..@ xbins : num 2
..@ shape : num 1
..@ xbnds : num [1:2] 1 3
..@ ybnds : num [1:2] 1 3
..@ dimen : num [1:2] 4 3
..@ n : int 3
..@ ncells: int 3
..@ call : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
..@ xlab : chr "x"
..@ ylab : chr "y"
..@ cID : NULL
..@ cAtt : int(0)
I repeat the plot, leaving out data point 2:
p <- ggplot(df [-2,], aes(x=x, y=y)) + geom_hex(binwidth=c(1, 1)) + coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p
p + geom_point (data = trace.grid, size = 4) + geom_point (data = df, col = "red")
str (trace.h)
Formal class 'hexbin' [package "hexbin"] with 16 slots
..@ cell : int [1:2] 3 7
..@ count : int [1:2] 1 1
..@ xcm : num [1:2] 3 1
..@ ycm : num [1:2] 1 3
..@ xbins : num 2
..@ shape : num 1
..@ xbnds : num [1:2] 1 3
..@ ybnds : num [1:2] 1 3
..@ dimen : num [1:2] 4 3
..@ n : int 2
..@ ncells: int 2
..@ call : language hexbin(x = x, y = y, xbins = xbins, shape = ybins/xbins, xbnds = xbnds, ybnds = ybnds)
..@ xlab : chr "x"
..@ ylab : chr "y"
..@ cID : NULL
..@ cAtt : int(0)
note that the results from
hexbin
are on the same grid (cell numbers did not change, just cell 5 is not populated any more and thus not listed), grid dimensions and ranges did not change. But the plotted hexagons did change dramatically.Also notice that
hgridcent
forgets to return the center coordinates of the first cell (lower left).
Though it gets populated:
df <- data.frame (x = 1 : 3, y = 1 : 3)
p <- ggplot(df, aes(x=x, y=y)) + geom_hex(binwidth=c(0.5, 0.8)) +
coord_fixed (xlim = c (0, 4), ylim = c (0,4))
p # needed for the tracing to occur
p + geom_point (data = trace.grid, size = 4) +
geom_point (data = df, col = "red") + # data pts
geom_point (data = as.data.frame (hcell2xy (trace.h)), shape = 1, size = 6)
Here, the rendering of the hexagons cannot possibly be correct - they do not belong to one hexagonal grid.
回答2:
I tried to replicate your solution with the same data set using lattice hexbinplot
. Initially, it gave me an error xbnds[1] < xbnds[2] is not fulfilled
. This error was due to wrong numeric vectors specifying range of values that should be covered by the binning. I changed those arguments in hexbinplot
, and it somehow worked. Not sure if it helps you to solve it with ggplot, but it's probably some starting point.
library(lattice)
library(hexbin)
hexbinplot(y ~ x | factor, bindata, xbnds = "panel", ybnds = "panel", xbins=5,
layout=c(7,3))
EDIT
Although rectangular bins with stat_bin2d()
work just fine:
ggplot(bindata, aes(x=x, y=y, group=factor)) +
facet_wrap(~factor) +
stat_bin2d(binwidth=c(0.6, 0.6))
回答3:
There are two source files that we are interested in: stat-binhex.r and geom-hex.r, mainly hexBin
and hexGrob
functions.
As @Dinre mentioned, this issue is not really related to faceting. What we can see is that binwidth
is not ignored and is used in a special way in hexBin
, this function is applied for every facet separately. After that, hexGrob
is applied for every facet. To be sure you can inspect them with e.g.
trace(ggplot2:::hexGrob, quote(browser()))
trace(ggplot2:::hexBin, quote(browser()))
Hence this explains why sizes differ - they depend on both binwidth
and the data of each facet itself.
It is difficult to keep track of the process because of various coordinates transforms, but notice that the output of hexBin
data.frame(
hcell2xy(hb),
count = hb@count,
density = hb@count / sum(hb@count, na.rm=TRUE)
)
always seems to look quite ordinary and that hexGrob
is responsible for drawing hex bins, distortion, i.e. it has polygonGrob
. In case when there is only one hex bin in a facet there is a more serious anomaly.
dx <- resolution(x, FALSE)
dy <- resolution(y, FALSE) / sqrt(3) / 2 * 1.15
in ?resolution
we can see
Description
The resolution is is the smallest non-zero distance between adjacent values. If there is only one unique value, then the resolution is defined to be one.
for this reason (resolution(x, FALSE) == 1
and resolution(y, FALSE) == 1
) the x coordinates of polygonGrob
of the first facet in your example are
[1] 1.5native 1.5native 0.5native -0.5native -0.5native 0.5native
and if I am not wrong, in this case native units are like npc, so they should be between 0 and 1. That is, in case of single hex bin it goes out of range because of resolution()
. This function also is the reason of distortion that @Dinre mentioned even when having up to several hex bins.
So for now there does not seem to be an option to have hex bins of equal size. A temporal (and very inconvenient for a large number of factors) solution could begin with something like this:
library(gridExtra)
set.seed(2)
bindata <- data.frame(x = rnorm(100), y = rnorm(100))
fac_probs <- c(10, 40, 40, 10)
bindata$factor <- sample(letters[1:4], 100,
replace = TRUE, prob = fac_probs)
binwidths <- list(c(0.4, 0.4), c(0.5, 0.5),
c(0.5, 0.5), c(0.4, 0.4))
plots <- mapply(function(w,z){
ggplot(bindata[bindata$factor == w, ], aes(x = x, y = y)) +
geom_hex(binwidth = z) + theme(legend.position = 'none')
}, letters[1:4], binwidths, SIMPLIFY = FALSE)
do.call(grid.arrange, plots)
回答4:
I also did some fiddling around with the hex plots in 'ggplot2', and I was able to consistently produce significant bin distortion when a factor's population was reduced to 8 or below. I can't explain why this is happening without digging down into the package source (which I am reluctant to do), but I can tell you that sparse factors seem to consistently wreck the hex bin plotting in 'ggplot2'.
This suggests to me that the size and shape of a particular hex bin in 'ggplot2' is related to a calculation that is unique to each facet, instead of doing a single calculation for the group and plotting the data afterwards. This is somewhat reinforced by the fact that I can reproduce the distortion in any given facet by plotting only that single factor, like so:
ggplot(bindata[bindata$factor=="e",], aes(x=x, y=y)) +
geom_hex()
This feels like something that should be elevated to the package maintainer, Hadley Wickham (h.wickham at gmail.com). This info is publicly available from CRAN.
Update: I sent an email to the Hadley Wickham asking if he would take a look at this question, and he confirmed that this behavior is indeed a bug.
来源:https://stackoverflow.com/questions/14495111/setting-hex-bins-in-ggplot2-to-same-size