how to create a heatmap with a fixed external hierarchical cluster

匿名 (未验证) 提交于 2019-12-03 02:19:01

问题:

I have a matrix data, and want to visualize it with heatmap. The rows are species, so I want visualize the phylogenetic tree aside the rows and reorder the rows of the heatmap according the tree. I know the heatmap function in R can create the hierarchical clustering heatmap, but how can I use my phylogenetic clustering instead of the default created distance clustering in the plot?

回答1:

First you need to use package ape to read in your data as a phylo object.

library(ape) dat <- read.tree(file="your/newick/file") #or dat <- read.tree(text="((A:4.2,B:4.2):3.1,C:7.3);") 

The following only works if your tree is ultrametric.

The next step is to transform your phylogenetic tree into class dendrogram.

Here is an example:

data(bird.orders) #This is already a phylo object hc <- as.hclust(bird.orders) #Compulsory step as as.dendrogram doesn't have a method for phylo objects. dend <- as.dendrogram(hc) plot(dend, horiz=TRUE) 

mat <- matrix(rnorm(23*23),nrow=23, dimnames=list(sample(bird.orders$tip, 23), sample(bird.orders$tip, 23))) #Some random data to plot 

First we need to order the matrix according to the order in the phylogenetic tree:

ord.mat <- mat[bird.orders$tip,bird.orders$tip] 

Then input it to heatmap:

heatmap(ord.mat, Rowv=dend, Colv=dend) 

Edit: Here is a function to deal with ultrametric and non-ultrametric trees.

heatmap.phylo <- function(x, Rowp, Colp, ...){     # x numeric matrix     # Rowp: phylogenetic tree (class phylo) to be used in rows     # Colp: phylogenetic tree (class phylo) to be used in columns     # ... additional arguments to be passed to image function     x <- x[Rowp$tip, Colp$tip]     xl <- c(0.5, ncol(x)+0.5)     yl <- c(0.5, nrow(x)+0.5)     layout(matrix(c(0,1,0,2,3,4,0,5,0),nrow=3, byrow=TRUE),                   width=c(1,3,1), height=c(1,3,1))     par(mar=rep(0,4))     plot(Colp, direction="downwards", show.tip.label=FALSE,                xlab="",ylab="", xaxs="i", x.lim=xl)     par(mar=rep(0,4))     plot(Rowp, direction="rightwards", show.tip.label=FALSE,                 xlab="",ylab="", yaxs="i", y.lim=yl)     par(mar=rep(0,4), xpd=TRUE)     image((1:nrow(x))-0.5, (1:ncol(x))-0.5, x,             xaxs="i", yaxs="i", axes=FALSE, xlab="",ylab="", ...)     par(mar=rep(0,4))     plot(NA, axes=FALSE, ylab="", xlab="", yaxs="i", xlim=c(0,2), ylim=yl)     text(rep(0,nrow(x)),1:nrow(x),Rowp$tip, pos=4)     par(mar=rep(0,4))     plot(NA, axes=FALSE, ylab="", xlab="", xaxs="i", ylim=c(0,2), xlim=xl)     text(1:ncol(x),rep(2,ncol(x)),Colp$tip, srt=90, pos=2)     } 

Here is with the previous (ultrametric) example:

heatmap.phylo(mat, bird.orders, bird.orders) 

And with a non-ultrametric:

cat("owls(((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3):6.3,Tyto_alba:13.5);",     file = "ex.tre", sep = "\n") tree.owls <- read.tree("ex.tre") mat2 <- matrix(rnorm(4*4),nrow=4,               dimnames=list(sample(tree.owls$tip,4),sample(tree.owls$tip,4))) is.ultrametric(tree.owls) [1] FALSE heatmap.phylo(mat2,tree.owls,tree.owls) 



回答2:

First, I create a reproducible example. Without data we can just guess what you want. So please try to do better next time(specially you are confirmed user). For example you can do this to create your tree in newick format:

tree.text='(((XXX:4.2,ZZZ:4.2):3.1,HHH:7.3):6.3,AAA:13.6);' 

Like @plannpus, I am using ape to converts this tree to a hclust class. Unfortunatlty, it looks that we can do the conversion only for ultrametric tree: the distance from the root to each tip is the same.

library(ape) tree <- read.tree(text='(((XXX:4.2,ZZZ:4.2):3.1,HHH:7.3):6.3,AAA:13.6);') is.ultrametric(tree) hc <- as.hclust.phylo(tree) 

Then I am using dendrogramGrob from latticeExtra to plot my tree. and levelplot from lattice to draw the heatmap.

library(latticeExtra) dd.col <- as.dendrogram(hc) col.ord <- order.dendrogram(dd.col) mat <- matrix(rnorm(4*4),nrow=4) colnames(mat) <- tree$tip.label rownames(mat) <- tree$tip.label levelplot(mat[tree$tip,tree$tip],type=c('g','p'),           aspect = "fill",           colorkey = list(space = "left"),           legend =             list(right =                    list(fun = dendrogramGrob,                         args =                           list(x = dd.col,                                 side = "right",                                size = 10))),           panel=function(...){             panel.fill('black',alpha=0.2)             panel.levelplot.points(...,cex=12,pch=23)           } ) 



回答3:

I adapted plannapus' answer to deal with more than one tree (also cutting out some options I didn't need in the process):

library(ape)  heatmap.phylo <- function(x, Rowp, Colp, breaks, col, denscol="cyan", respect=F, ...){     # x numeric matrix     # Rowp: phylogenetic tree (class phylo) to be used in rows     # Colp: phylogenetic tree (class phylo) to be used in columns     # ... additional arguments to be passed to image function      scale01 <- function(x, low = min(x), high = max(x)) {         x <- (x - low)/(high - low)         x     }      col.tip <- Colp$tip     n.col <- 1     if (is.null(col.tip)) {         n.col <- length(Colp)         col.tip <- unlist(lapply(Colp, function(t) t$tip))         col.lengths <- unlist(lapply(Colp, function(t) length(t$tip)))         col.fraction <- col.lengths / sum(col.lengths)         col.heights <- unlist(lapply(Colp, function(t) max(node.depth.edgelength(t))))         col.max_height <- max(col.heights)     }      row.tip <- Rowp$tip     n.row <- 1     if (is.null(row.tip)) {         n.row <- length(Rowp)         row.tip <- unlist(lapply(Rowp, function(t) t$tip))         row.lengths <- unlist(lapply(Rowp, function(t) length(t$tip)))         row.fraction <- row.lengths / sum(row.lengths)         row.heights <- unlist(lapply(Rowp, function(t) max(node.depth.edgelength(t))))         row.max_height <- max(row.heights)     }      cexRow <- min(1, 0.2 + 1/log10(n.row))     cexCol <- min(1, 0.2 + 1/log10(n.col))      x <- x[row.tip, col.tip]     xl <- c(0.5, ncol(x)+0.5)     yl <- c(0.5, nrow(x)+0.5)      screen_matrix <- matrix( c(         0,1,4,5,         1,4,4,5,         0,1,1,4,         1,4,1,4,         1,4,0,1,         4,5,1,4     ) / 5, byrow=T, ncol=4 )      if (respect) {         r <- grconvertX(1, from = "inches", to = "ndc") / grconvertY(1, from = "inches", to = "ndc")         if (r < 1) {             screen_matrix <- screen_matrix * matrix( c(r,r,1,1), nrow=6, ncol=4, byrow=T)         } else {             screen_matrix <- screen_matrix * matrix( c(1,1,1/r,1/r), nrow=6, ncol=4, byrow=T)         }     }       split.screen( screen_matrix )      screen(2)     par(mar=rep(0,4))      if (n.col == 1) {         plot(Colp, direction="downwards", show.tip.label=FALSE,xaxs="i", x.lim=xl)     } else {         screens <- split.screen( as.matrix(data.frame( left=cumsum(col.fraction)-col.fraction, right=cumsum(col.fraction), bottom=0, top=1)))         for (i in 1:n.col) {             screen(screens[i])             plot(Colp[[i]], direction="downwards", show.tip.label=FALSE,xaxs="i", x.lim=c(0.5,0.5+col.lengths[i]), y.lim=-col.max_height+col.heights[i]+c(0,col.max_height))         }     }      screen(3)     par(mar=rep(0,4))      if (n.col == 1) {         plot(Rowp, direction="rightwards", show.tip.label=FALSE,yaxs="i", y.lim=yl)     } else {         screens <- split.screen( as.matrix(data.frame( left=0, right=1, bottom=cumsum(row.fraction)-row.fraction, top=cumsum(row.fraction))) )         for (i in 1:n.col) {             screen(screens[i])             plot(Rowp[[i]], direction="rightwards", show.tip.label=FALSE,yaxs="i", x.lim=c(0,row.max_height), y.lim=c(0.5,0.5+row.lengths[i]))         }     }       screen(4)     par(mar=rep(0,4), xpd=TRUE)     image((1:nrow(x))-0.5, (1:ncol(x))-0.5, x, xaxs="i", yaxs="i", axes=FALSE, xlab="",ylab="", breaks=breaks, col=col, ...)      screen(6)     par(mar=rep(0,4))     plot(NA, axes=FALSE, ylab="", xlab="", yaxs="i", xlim=c(0,2), ylim=yl)     text(rep(0,nrow(x)),1:nrow(x),row.tip, pos=4, cex=cexCol)      screen(5)     par(mar=rep(0,4))     plot(NA, axes=FALSE, ylab="", xlab="", xaxs="i", ylim=c(0,2), xlim=xl)     text(1:ncol(x),rep(2,ncol(x)),col.tip, srt=90, adj=c(1,0.5), cex=cexRow)      screen(1)     par(mar = c(2, 2, 1, 1), cex = 0.75)      symkey <- T     tmpbreaks <- breaks     if (symkey) {         max.raw <- max(abs(c(x, breaks)), na.rm = TRUE)         min.raw <- -max.raw         tmpbreaks[1] <- -max(abs(x), na.rm = TRUE)         tmpbreaks[length(tmpbreaks)] <- max(abs(x), na.rm = TRUE)     } else {         min.raw <- min(x, na.rm = TRUE)         max.raw <- max(x, na.rm = TRUE)     }     z <- seq(min.raw, max.raw, length = length(col))      image(z = matrix(z, ncol = 1), col = col, breaks = tmpbreaks,            xaxt = "n", yaxt = "n")     par(usr = c(0, 1, 0, 1))     lv <- pretty(breaks)     xv <- scale01(as.numeric(lv), min.raw, max.raw)     axis(1, at = xv, labels = lv)      h <- hist(x, plot = FALSE, breaks = breaks)     hx <- scale01(breaks, min.raw, max.raw)     hy <- c(h$counts, h$counts[length(h$counts)])     lines(hx, hy/max(hy) * 0.95, lwd = 1, type = "s",            col = denscol)     axis(2, at = pretty(hy)/max(hy) * 0.95, pretty(hy))     par(cex = 0.5)     mtext(side = 2, "Count", line = 2)      close.screen(all.screens = T)  }  tree <- read.tree(text = "(A:1,B:1);((C:1,D:2):2,E:1);((F:1,G:1,H:2):5,((I:1,J:2):2,K:1):1);", comment.char="") N <- sum(unlist(lapply(tree, function(t) length(t$tip))))  set.seed(42) m <- cor(matrix(rnorm(N*N), nrow=N)) rownames(m) <- colnames(m) <- LETTERS[1:N] heatmap.phylo(m, tree, tree, col=bluered(10), breaks=seq(-1,1,length.out=11), respect=T)  


回答4:

This exact application of a heatmap is already implemented in the plot_heatmap function (based on ggplot2) in the phyloseq package, which is openly/freely developed on GitHub. Examples with complete code and results are included here:

http://joey711.github.io/phyloseq/plot_heatmap-examples

One caveat, and not what you are explicitly asking for here, but phyloseq::plot_heatmap does not overlay a hierarchical tree for either axis. There is a good reason not to base your axis ordering on hierarchical clustering -- and this is because of the way indices at the end of long branches can still be next to each other arbitrarily depending on how branches are rotated at the nodes. This point, and an alternative based on non-metric multidimensional scaling is explained further in an article about the NeatMap package, which is also written for R and uses ggplot2. This dimension-reduction (ordination) approach to ordering the indices in a heatmap is adapted for phylogenetic abundance data in phyloseq::plot_heatmap.



回答5:

While my suggestion for phlyoseq::plot_heatmap would get you part of the way there, the powerful "ggtree" package can do this, or more, if representing data on trees is really what you are going for.

Some examples are shown on the top of the following ggtree documentation page:

http://www.bioconductor.org/packages/3.7/bioc/vignettes/ggtree/inst/doc/advanceTreeAnnotation.html

Note that I am not affiliated with ggtree dev at all. Just a fan of the project and what it can already do.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!