I have some raw data in 2D, x, y as given below. I want to generate a 2D histogram from the data. Typically, dividing the x,y values into bins of size 0.5, and count the number
For completeness, you can also use the hist2d{gplots}
function. It seems to be the most straightforward for a 2D plot:
library(gplots)
# data is in variable df
# define bin sizes
bin_size <- 0.5
xbins <- (max(df$x) - min(df$x))/bin_size
ybins <- (max(df$y) - min(df$y))/bin_size
# create plot
hist2d(df, same.scale=TRUE, nbins=c(xbins, ybins))
# if you want to retrieve the data for other purposes
df.hist2d <- hist2d(df, same.scale=TRUE, nbins=c(xbins, ybins), show=FALSE)
df.hist2d$counts
Bivariate density estimates can be done with MASS::kde2d
, or KernSmooth::bkde2D
(both supplied with the base R distribution). The latter uses an algorithm based on the fast Fourier transform over a grid of points, and is very fast. The result can be plotted with contour
or persp
or similar functions in other graphing packages.
Using your data:
require(KernSmooth)
z <- bkde2D(df, .5)
persp(z$fhat)
i came to this page from http://www.r-bloggers.com/5-ways-to-do-2d-histograms-in-r/ which lists one of the answers above. It provides code samples for a total of 5 methods:
hist2d from the library gplots
hexbin,hexbinplot from the library hexbin
stat_bin2d from the library ggplot2
kde2d from the library MASS
the "hard way" solution listed above.
The ggplot
is elegant and fast and pretty, as usual. But if you want to use base graphics (image
, contour
, persp
) and display your actual frequencies (instead of the smoothing 2D kernel), you have to first obtain the binnings yourself and create a matrix of frequencies. Here's some code (not necessarily elegant, but pretty robust) that does 2D binning and generates plots somewhat similar to the ones above:
require(mvtnorm)
xy <- rmvnorm(1000,c(5,10),sigma=rbind(c(3,-2),c(-2,3)))
nbins <- 20
x.bin <- seq(floor(min(xy[,1])), ceiling(max(xy[,1])), length=nbins)
y.bin <- seq(floor(min(xy[,2])), ceiling(max(xy[,2])), length=nbins)
freq <- as.data.frame(table(findInterval(xy[,1], x.bin),findInterval(xy[,2], y.bin)))
freq[,1] <- as.numeric(freq[,1])
freq[,2] <- as.numeric(freq[,2])
freq2D <- diag(nbins)*0
freq2D[cbind(freq[,1], freq[,2])] <- freq[,3]
par(mfrow=c(1,2))
image(x.bin, y.bin, freq2D, col=topo.colors(max(freq2D)))
contour(x.bin, y.bin, freq2D, add=TRUE, col=rgb(1,1,1,.7))
palette(rainbow(max(freq2D)))
cols <- (freq2D[-1,-1] + freq2D[-1,-(nbins-1)] + freq2D[-(nbins-1),-(nbins-1)] + freq2D[-(nbins-1),-1])/4
persp(freq2D, col=cols)
For a really fun time, try making an interactive, zoomable, 3D surface:
require(rgl)
surface3d(x.bin,y.bin,freq2D/10, col="red")
freq <- as.data.frame(table(findInterval(xy[,1], x.bin),findInterval(xy[,2], y.bin)))
freq[,1] <- as.numeric(freq[,1])
freq[,2] <- as.numeric(freq[,2])
This is probably wrong since it destroys the original indices.
If you want it with a 2d contour, you can also use the package ggplot2. Some example code is shown in this question:
gradient breaks in a ggplot stat_bin2d plot Adjusted slightly:
x <- rnorm(10000)+5
y <- rnorm(10000)+5
df <- data.frame(x,y)
require(ggplot2)
p <- ggplot(df, aes(x, y))
p <- p + stat_bin2d(bins = 20)
p
Here's the output of the code above: