I have been trying to minimize my use of Excel in favor of R, but am still stuck when it comes to display simple data cells as is often needed as the last step of an analysis. T
Myaseen208 has a good start on the answer. I thought I'd fill in a few more pieces: getting color gradient in the red/green you specified, flipping the order of the y-axis, and cleaning up a few other points (gray background and legend).
library("reshape2")
library("ggplot2")
cor_dat <- melt(cor_matrix[-11,])
cor_dat$Var1 <- factor(cor_dat$Var1, levels=item_names)
cor_dat$Var2 <- factor(cor_dat$Var2, levels=rev(item_names))
cor_dat$pctile <- rank(cor_dat$value, na.last="keep")/sum(!is.na(cor_dat$value))
ggplot(data = cor_dat, aes(x = Var1, y = Var2)) +
geom_tile(aes(fill = pctile), colour = "white") +
geom_text(aes(label = sprintf("%1.1f",value)), vjust = 1) +
scale_fill_gradientn(colours=c("red","red","white","green","green"),
values=c(0,0.05,0.5,0.95,1),
guide = "none", na.value = "white") +
coord_equal() +
opts(axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
panel.background = theme_blank())
EDIT:
Now attempting to get the blue size bars at the bottom.
What makes the size bars harder is that they are a completely different representation of different data than the correlation matrix. So I will first try and make just that part separate and then work on putting them together.
Like with the cor data, first the size data is extracted from the matrix and then turned into a data.frame that has the useful values, including the fraction of the total.
size_dat <- melt(cor_matrix[11,,drop=FALSE])
size_dat$Var2 <- factor(size_dat$Var2, levels=item_names)
size_dat$frac <- size_dat$value / max(size_dat$value)
ggplot(data=size_dat, aes(x=Var2, y=Var1)) +
geom_blank() +
geom_rect(aes(xmin = as.numeric(Var2) - 0.5,
xmax = as.numeric(Var2) - 0.5 + frac),
ymin = -Inf, ymax = Inf, fill="blue", color="white") +
coord_equal() +
opts(axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
panel.background = theme_blank())
The geom_rect
call uses some tricks such as using the numeric representation of the categorical (discrete) variable to position things carefully. Each "item" goes from 0.5 below it to 0.5 above it. So the left edge of the rectangle is 0.5 below the item value, and the right edge is frac
to the right of that. Using Inf
and -Inf
for the y limits means go to the extreme of the plot. This gives
Now to try and put them together. The x scale is common, and the y scales can be made common (though disjoint). Playing with levels and orders is necessary. Also, I flipped x and y in the original (which is fine since it is symmetric). Since the data sets are extracted and formatted a little differently, I've renamed them.
cor_dat2 <- melt(cor_matrix[-(nrow(cor_matrix),])
cor_dat2$Var1 <- factor(cor_dat$Var1, levels=rev(c(item_names, "Size")))
cor_dat2$Var2 <- factor(cor_dat$Var2, levels=item_names)
cor_dat2$pctile <- rank(cor_dat$value, na.last="keep")/sum(!is.na(cor_dat$value))
size_dat2 <- melt(cor_matrix["Size",,drop=FALSE])
size_dat2$Var1 <- factor(size_dat$Var1, levels=rev(c(item_names, "Size")))
size_dat2$Var2 <- factor(size_dat$Var2, levels=item_names)
size_dat2$frac <- size_dat$value / max(size_dat$value)
ggplot(data = cor_dat2, aes(x = Var2, y = Var1)) +
geom_tile(aes(fill = pctile), colour = "white") +
geom_text(aes(label = sprintf("%1.1f",value))) +
geom_rect(data=size_dat2,
aes(xmin = as.numeric(Var2) - 0.5,
xmax = as.numeric(Var2) - 0.5 + frac,
ymin = as.numeric(Var1) - 0.5,
ymax = as.numeric(Var1) + 0.5),
fill="lightblue", color="white") +
geom_text(data=size_dat2,
aes(x=Var2, y=Var1, label=sprintf("%.0f", value))) +
scale_fill_gradientn(colours=c("red","red","white","green","green"),
values=c(0,0.05,0.5,0.95,1),
guide = "none", na.value = "white") +
scale_y_discrete(drop = FALSE) +
coord_equal() +
opts(axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
panel.background = theme_blank())
This final version does not assume that it is a 10x10 correlation with an additional row. It can be any number. cor_matrix
must have the right names (and "Size" has to be the last row) and item_names
must contain the list of items. But it doesn't have to be 10.