Plot of a correlation matrix in R like in Excel example

后端 未结 3 1196
执笔经年
执笔经年 2021-02-03 15:21

I have been trying to minimize my use of Excel in favor of R, but am still stuck when it comes to display simple data cells as is often needed as the last step of an analysis. T

3条回答
  •  走了就别回头了
    2021-02-03 15:49

    Myaseen208 has a good start on the answer. I thought I'd fill in a few more pieces: getting color gradient in the red/green you specified, flipping the order of the y-axis, and cleaning up a few other points (gray background and legend).

    library("reshape2")
    library("ggplot2")
    
    cor_dat <- melt(cor_matrix[-11,])
    cor_dat$Var1 <- factor(cor_dat$Var1, levels=item_names)
    cor_dat$Var2 <- factor(cor_dat$Var2, levels=rev(item_names))
    cor_dat$pctile <- rank(cor_dat$value, na.last="keep")/sum(!is.na(cor_dat$value))
    
    ggplot(data =  cor_dat, aes(x = Var1, y = Var2)) +
      geom_tile(aes(fill = pctile), colour = "white") +
      geom_text(aes(label = sprintf("%1.1f",value)), vjust = 1) +
      scale_fill_gradientn(colours=c("red","red","white","green","green"),
                           values=c(0,0.05,0.5,0.95,1),
                           guide = "none", na.value = "white") +
      coord_equal() +
      opts(axis.title.x = theme_blank(),
           axis.title.y = theme_blank(),
           panel.background = theme_blank())
    

    enter image description here

    EDIT:

    Now attempting to get the blue size bars at the bottom.

    What makes the size bars harder is that they are a completely different representation of different data than the correlation matrix. So I will first try and make just that part separate and then work on putting them together.

    Like with the cor data, first the size data is extracted from the matrix and then turned into a data.frame that has the useful values, including the fraction of the total.

    size_dat <- melt(cor_matrix[11,,drop=FALSE])
    size_dat$Var2 <- factor(size_dat$Var2, levels=item_names)
    size_dat$frac <- size_dat$value / max(size_dat$value)
    
    ggplot(data=size_dat, aes(x=Var2, y=Var1)) +
      geom_blank() +
      geom_rect(aes(xmin = as.numeric(Var2) - 0.5, 
                    xmax = as.numeric(Var2) - 0.5 + frac),
                ymin = -Inf, ymax = Inf, fill="blue", color="white")  +
      coord_equal() +
      opts(axis.title.x = theme_blank(),
           axis.title.y = theme_blank(),
           panel.background = theme_blank())
    

    The geom_rect call uses some tricks such as using the numeric representation of the categorical (discrete) variable to position things carefully. Each "item" goes from 0.5 below it to 0.5 above it. So the left edge of the rectangle is 0.5 below the item value, and the right edge is frac to the right of that. Using Inf and -Inf for the y limits means go to the extreme of the plot. This gives

    enter image description here

    Now to try and put them together. The x scale is common, and the y scales can be made common (though disjoint). Playing with levels and orders is necessary. Also, I flipped x and y in the original (which is fine since it is symmetric). Since the data sets are extracted and formatted a little differently, I've renamed them.

    cor_dat2 <- melt(cor_matrix[-(nrow(cor_matrix),])
    cor_dat2$Var1 <- factor(cor_dat$Var1, levels=rev(c(item_names, "Size")))
    cor_dat2$Var2 <- factor(cor_dat$Var2, levels=item_names)
    cor_dat2$pctile <- rank(cor_dat$value, na.last="keep")/sum(!is.na(cor_dat$value))
    
    size_dat2 <- melt(cor_matrix["Size",,drop=FALSE])
    size_dat2$Var1 <- factor(size_dat$Var1, levels=rev(c(item_names, "Size")))
    size_dat2$Var2 <- factor(size_dat$Var2, levels=item_names)
    size_dat2$frac <- size_dat$value / max(size_dat$value)
    
    ggplot(data = cor_dat2, aes(x = Var2, y = Var1)) +
      geom_tile(aes(fill = pctile), colour = "white") +
      geom_text(aes(label = sprintf("%1.1f",value))) +
      geom_rect(data=size_dat2,
                aes(xmin = as.numeric(Var2) - 0.5, 
                    xmax = as.numeric(Var2) - 0.5 + frac,
                    ymin = as.numeric(Var1) - 0.5,
                    ymax = as.numeric(Var1) + 0.5),
                fill="lightblue", color="white")  +
      geom_text(data=size_dat2, 
                aes(x=Var2, y=Var1, label=sprintf("%.0f", value))) +
      scale_fill_gradientn(colours=c("red","red","white","green","green"),
                           values=c(0,0.05,0.5,0.95,1),
                           guide = "none", na.value = "white") +
      scale_y_discrete(drop = FALSE) +
      coord_equal() +
      opts(axis.title.x = theme_blank(),
           axis.title.y = theme_blank(),
           panel.background = theme_blank())
    

    enter image description here

    This final version does not assume that it is a 10x10 correlation with an additional row. It can be any number. cor_matrix must have the right names (and "Size" has to be the last row) and item_names must contain the list of items. But it doesn't have to be 10.

提交回复
热议问题