How to create a Marimekko/Mosaic plot in ggplot2

后端 未结 7 741
我寻月下人不归
我寻月下人不归 2020-11-27 02:59

The Marimekko/Mosaic plot is a nice default plot when both x and y are categorical variables. What is the best way to create these using ggplot?

相关标签:
7条回答
  • 2020-11-27 03:07

    Thanks all who created this entry which really helped me as ggmosaic wasn't doing what I wanted (and not labelling axes properly). The nice function from Z.Lin throws a warning sort of explained in https://github.com/tidyverse/ggplot2/issues/3142 which seems to say that warning, which is technically untrue in its content, is really warning us that the ggplotocracy, bless and thank them, feel that geom_bar shouldn't really have variable widths. I guess I see the point so I went for the function from Jake Fisher and tweaked it to my own needs. In case it's useful to others, here it is:

    makeplot_mosaic2 <- function(data, x, y, statDigits = 1, residDigits = 1, pDigits = 3, ...){
      ### from https://stackoverflow.com/questions/19233365/how-to-create-a-marimekko-mosaic-plot-in-ggplot2,
      ### this from Jake Fisher (I think)
      xvar <- deparse(substitute(x))
      yvar <- deparse(substitute(y))
      mydata <- data[c(xvar, yvar)]
      mytable <- table(mydata)
      
      widths <- c(0, cumsum(apply(mytable, 1, sum)))
      heights <- apply(mytable, 1, function(x){c(0, cumsum(x/sum(x)))})
      
      alldata <- data.frame()
      allnames <- data.frame()
      for(i in 1:nrow(mytable)){
        for(j in 1:ncol(mytable)){
          alldata <- rbind(alldata, c(widths[i], widths[i+1], heights[j, i], heights[j+1, i]))
        }
      }
      colnames(alldata) <- c("xmin", "xmax", "ymin", "ymax")
      
      alldata[[xvar]] <- rep(dimnames(mytable)[[1]],rep(ncol(mytable), nrow(mytable)))
      alldata[[yvar]] <- rep(dimnames(mytable)[[2]],nrow(mytable))
      
      chisq <- chisq.test(mytable)
      df <- chisq$parameter
      pval <- chisq$p.value
      chisqval <- chisq$statistic
      # stdResids <- chisq$stdres
      alldata$xcent <- (alldata$xmin + alldata$xmax)/2
      alldata$ycent <- (alldata$ymin + alldata$ymax)/2
      alldata$stdres <- round(as.vector(t(chisq$stdres)), residDigits)
      # print(chisq$stdres)
      # print(alldata)
      
      titleTxt1 <- paste0("Mosaic plot of ",
                          yvar,
                          " against ",
                          xvar,
                          ", ")
      titleTxt2 <- paste0("chisq(",
                         df,
                         ") = ",
                         round(chisqval, statDigits),
                         ", p = ",
                         format.pval(pval, digits = pDigits))
      titleTxt <- paste0(titleTxt1, titleTxt2)
      subTitleTxt <- "Cell labels are standardised residuals"
      
      ggplot(data  = alldata, 
             aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)) + 
        geom_rect(color="black", aes_string(fill=yvar)) +
        geom_text(aes(x = xcent, y = ycent, label = stdres)) +
        xlab(paste0("Count of '", 
                    xvar,
                    "', total = ",
                    max(alldata$xmax))) + # tweaked by CE
        ylab(paste0("Proportion of '", 
                    yvar,
                    "' per level of '",
                    xvar,
                    "'")) +
        ggtitle(titleTxt,
                subtitle = subTitleTxt) +
        theme_bw() +
        theme(plot.title = element_text(hjust = .5),
              plot.subtitle = element_text(hjust = .5))
    }
    
    makeplot_mosaic2(mtcars, vs, gear)
    makeplot_mosaic2(diamonds, cut, clarity)
    
    0 讨论(0)
  • 2020-11-27 03:11

    I had the same issue for a project some time back. My solution was to use geom_bar together with the scales="free_x", space="free_x" option in facet_grid to accommodate different bar widths:

    # using diamonds dataset for illustration
    df <- diamonds %>%
      group_by(cut, clarity) %>%
      summarise(count = n()) %>%
      mutate(cut.count = sum(count),
             prop = count/sum(count)) %>%
      ungroup()
    
    ggplot(df,
           aes(x = cut, y = prop, width = cut.count, fill = clarity)) +
      geom_bar(stat = "identity", position = "fill", colour = "black") +
      # geom_text(aes(label = scales::percent(prop)), position = position_stack(vjust = 0.5)) + # if labels are desired
      facet_grid(~cut, scales = "free_x", space = "free_x") +
      scale_fill_brewer(palette = "RdYlGn") +
      # theme(panel.spacing.x = unit(0, "npc")) + # if no spacing preferred between bars
      theme_void() 
    

    0 讨论(0)
  • 2020-11-27 03:19

    I did it myself a time ago, using just geom_bar, I turned it into a general function so it should work on any two factors. enter image description here

    ggMMplot <- function(var1, var2){
      require(ggplot2)
      levVar1 <- length(levels(var1))
      levVar2 <- length(levels(var2))
    
      jointTable <- prop.table(table(var1, var2))
      plotData <- as.data.frame(jointTable)
      plotData$marginVar1 <- prop.table(table(var1))
      plotData$var2Height <- plotData$Freq / plotData$marginVar1
      plotData$var1Center <- c(0, cumsum(plotData$marginVar1)[1:levVar1 -1]) +
        plotData$marginVar1 / 2
    
      ggplot(plotData, aes(var1Center, var2Height)) +
        geom_bar(stat = "identity", aes(width = marginVar1, fill = var2), col = "Black") +
        geom_text(aes(label = as.character(var1), x = var1Center, y = 1.05)) 
      }
    
    ggMMplot(diamonds$cut, diamonds$clarity)
    
    0 讨论(0)
  • 2020-11-27 03:19

    You may use the ggplot2 extension package called "ggmosaic" (https://github.com/haleyjeppson/ggmosaic).

    Extensive tutorial with example code and visual results is given here https://cran.r-project.org/web/packages/ggmosaic/vignettes/ggmosaic.html.

    0 讨论(0)
  • 2020-11-27 03:19

    Plotluck is a library based on ggplot2 that aims at automating the choice of plot type based on characteristics of 1-3 variables. It contains a function for mosaic plots. Example: plotluck(mtcars,vs,gear)

    enter image description here

    0 讨论(0)
  • 2020-11-27 03:22

    A first attempt. I'm not sure how to put the factor labels on the axis though.

    makeplot_mosaic <- function(data, x, y, ...){
      xvar <- deparse(substitute(x))
      yvar <- deparse(substitute(y))
      mydata <- data[c(xvar, yvar)];
      mytable <- table(mydata);
      widths <- c(0, cumsum(apply(mytable, 1, sum)));
      heights <- apply(mytable, 1, function(x){c(0, cumsum(x/sum(x)))});
    
      alldata <- data.frame();
      allnames <- data.frame();
      for(i in 1:nrow(mytable)){
        for(j in 1:ncol(mytable)){
          alldata <- rbind(alldata, c(widths[i], widths[i+1], heights[j, i], heights[j+1, i]));
        }
      }
      colnames(alldata) <- c("xmin", "xmax", "ymin", "ymax")
    
      alldata[[xvar]] <- rep(dimnames(mytable)[[1]],rep(ncol(mytable), nrow(mytable)));
      alldata[[yvar]] <- rep(dimnames(mytable)[[2]],nrow(mytable));
    
      ggplot(alldata, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)) + 
        geom_rect(color="black", aes_string(fill=yvar)) +
        xlab(paste(xvar, "(count)")) + ylab(paste(yvar, "(proportion)"));
    }
    

    Example:

    makeplot_mosaic(mtcars, vs, gear)
    

    example

    0 讨论(0)
提交回复
热议问题