How do you make a heat map and cluster with NA values?

问题

I am trying to make a heat map using my data however struggle to code it properly.
My matrix is filled with log(x+1) values, this way I don't encounter log(0) errors however due to the nature of my data I have a bunch of 0 values and they mask any sort of trends the heat map could be showing. Because of that I want to colour any 0 values grey or black and then the rest of my data colour along a blue-white-red spectrum.

Here is the coding I am using,

RHeatmap <- read.delim("~/Desktop/RHeatmap.txt", row.names=1, stringsAsFactors = FALSE)

my_palette <- colorRampPalette(c("blue", "white", "red")) (n=20)
RHeatmap.matrix <- as.matrix(RHeatmap)
RHeatmap.matrix[RHeatmap.matrix==0]=NA

heatmap.2(RHeatmap.matrix,trace="none",col = my_palette, margins =  c(5,1),scale = "none", symbreaks = FALSE, Colv=TRUE, dendrogram="both",lwid=c(1.5,2.0))

When looking online for how to assign the 0 values a separate colour I noticed people assign them as N.As which can then be coded to appear a certain colour. Question 1: How would I do that?

I also was wondering how I cluster with N.A values, when I tried I received an error saying you can't cluster with N.A values.

回答1:

To get this to work you need to specify the breaks.Note: There needs to be one more break than colors.

library(gplots)

dat <- matrix(2**rnorm(900, sd = 5), ncol=9)
dat[sample(seq_along(dat), size = 180)] <- 0 ##setting some data to 0

my_palette <- colorRampPalette(c("yellow", "orange", "red")) (n=20)
breaks <- seq(min(dat2, na.rm = T), max(dat2, na.rm = T), length.out = 21)

dat2 <- log2(dat+1)
dat2[dat2 == 0] <- NA
heatmap.2(dat2, trace="none", na.color = "black", scale="none", 
          col = my_palette, breaks=breaks)

My two cents about your more general visualization question:

1) All of your data is above 0 so I would recommend using a sequential color map, not a divergent color map. White tends to be viewed as 0, like in this case I see white and automatically thing it is 0.

2) Your current heatmap looks good to me, i.e. well clustered and represented (color map aside). I'm not sure how much "better" it could get or what "better" would look like.

3) If your data has 0's in it I would keep them, so long as they are meaningful. This is very data dependent.

4) You could look into different distance metrics that may treat/weight 0 entries differently.

5) Setting 0s to NA will change the clustering because distances are calculated on complete cases only, by default. Seedist for more info.

来源：https://stackoverflow.com/questions/41291051/how-do-you-make-a-heat-map-and-cluster-with-na-values

标签

heatmap

missing-data