The Marimekko/Mosaic plot is a nice default plot when both x and y are categorical variables. What is the best way to create these using ggplot?
Thanks all who created this entry which really helped me as ggmosaic wasn't doing what I wanted (and not labelling axes properly). The nice function from Z.Lin throws a warning sort of explained in which seems to say that warning, which is technically untrue in its content, is really warning us that the ggplotocracy, bless and thank them, feel that geom_bar shouldn't really have variable widths. I guess I see the point so I went for the function from Jake Fisher and tweaked it to my own needs. In case it's useful to others, here it is:
makeplot_mosaic2 <- function(data, x, y, statDigits = 1, residDigits = 1, pDigits = 3, ...){
### from,
### this from Jake Fisher (I think)
xvar <- deparse(substitute(x))
yvar <- deparse(substitute(y))
mydata <- data[c(xvar, yvar)]
mytable <- table(mydata)
widths <- c(0, cumsum(apply(mytable, 1, sum)))
heights <- apply(mytable, 1, function(x){c(0, cumsum(x/sum(x)))})
alldata <- data.frame()
allnames <- data.frame()
for(i in 1:nrow(mytable)){
for(j in 1:ncol(mytable)){
alldata <- rbind(alldata, c(widths[i], widths[i+1], heights[j, i], heights[j+1, i]))
colnames(alldata) <- c("xmin", "xmax", "ymin", "ymax")
alldata[[xvar]] <- rep(dimnames(mytable)[[1]],rep(ncol(mytable), nrow(mytable)))
alldata[[yvar]] <- rep(dimnames(mytable)[[2]],nrow(mytable))
chisq <- chisq.test(mytable)
df <- chisq$parameter
pval <- chisq$p.value
chisqval <- chisq$statistic
# stdResids <- chisq$stdres
alldata$xcent <- (alldata$xmin + alldata$xmax)/2
alldata$ycent <- (alldata$ymin + alldata$ymax)/2
alldata$stdres <- round(as.vector(t(chisq$stdres)), residDigits)
# print(chisq$stdres)
# print(alldata)
titleTxt1 <- paste0("Mosaic plot of ",
" against ",
", ")
titleTxt2 <- paste0("chisq(",
") = ",
round(chisqval, statDigits),
", p = ",
format.pval(pval, digits = pDigits))
titleTxt <- paste0(titleTxt1, titleTxt2)
subTitleTxt <- "Cell labels are standardised residuals"
ggplot(data = alldata,
aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)) +
geom_rect(color="black", aes_string(fill=yvar)) +
geom_text(aes(x = xcent, y = ycent, label = stdres)) +
xlab(paste0("Count of '",
"', total = ",
max(alldata$xmax))) + # tweaked by CE
ylab(paste0("Proportion of '",
"' per level of '",
"'")) +
subtitle = subTitleTxt) +
theme_bw() +
theme(plot.title = element_text(hjust = .5),
plot.subtitle = element_text(hjust = .5))
makeplot_mosaic2(mtcars, vs, gear)
makeplot_mosaic2(diamonds, cut, clarity)
I had the same issue for a project some time back. My solution was to use geom_bar
together with the scales="free_x", space="free_x"
option in facet_grid
to accommodate different bar widths:
# using diamonds dataset for illustration
df <- diamonds %>%
group_by(cut, clarity) %>%
summarise(count = n()) %>%
mutate(cut.count = sum(count),
prop = count/sum(count)) %>%
aes(x = cut, y = prop, width = cut.count, fill = clarity)) +
geom_bar(stat = "identity", position = "fill", colour = "black") +
# geom_text(aes(label = scales::percent(prop)), position = position_stack(vjust = 0.5)) + # if labels are desired
facet_grid(~cut, scales = "free_x", space = "free_x") +
scale_fill_brewer(palette = "RdYlGn") +
# theme(panel.spacing.x = unit(0, "npc")) + # if no spacing preferred between bars
I did it myself a time ago, using just geom_bar
, I turned it into a general function so it should work on any two factors
ggMMplot <- function(var1, var2){
levVar1 <- length(levels(var1))
levVar2 <- length(levels(var2))
jointTable <- prop.table(table(var1, var2))
plotData <-
plotData$marginVar1 <- prop.table(table(var1))
plotData$var2Height <- plotData$Freq / plotData$marginVar1
plotData$var1Center <- c(0, cumsum(plotData$marginVar1)[1:levVar1 -1]) +
plotData$marginVar1 / 2
ggplot(plotData, aes(var1Center, var2Height)) +
geom_bar(stat = "identity", aes(width = marginVar1, fill = var2), col = "Black") +
geom_text(aes(label = as.character(var1), x = var1Center, y = 1.05))
ggMMplot(diamonds$cut, diamonds$clarity)
You may use the ggplot2 extension package called "ggmosaic" (
Extensive tutorial with example code and visual results is given here
Plotluck is a library based on ggplot2 that aims at automating the choice of plot type based on characteristics of 1-3 variables. It contains a function for mosaic plots. Example:
A first attempt. I'm not sure how to put the factor labels on the axis though.
makeplot_mosaic <- function(data, x, y, ...){
xvar <- deparse(substitute(x))
yvar <- deparse(substitute(y))
mydata <- data[c(xvar, yvar)];
mytable <- table(mydata);
widths <- c(0, cumsum(apply(mytable, 1, sum)));
heights <- apply(mytable, 1, function(x){c(0, cumsum(x/sum(x)))});
alldata <- data.frame();
allnames <- data.frame();
for(i in 1:nrow(mytable)){
for(j in 1:ncol(mytable)){
alldata <- rbind(alldata, c(widths[i], widths[i+1], heights[j, i], heights[j+1, i]));
colnames(alldata) <- c("xmin", "xmax", "ymin", "ymax")
alldata[[xvar]] <- rep(dimnames(mytable)[[1]],rep(ncol(mytable), nrow(mytable)));
alldata[[yvar]] <- rep(dimnames(mytable)[[2]],nrow(mytable));
ggplot(alldata, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)) +
geom_rect(color="black", aes_string(fill=yvar)) +
xlab(paste(xvar, "(count)")) + ylab(paste(yvar, "(proportion)"));
makeplot_mosaic(mtcars, vs, gear)