Understanding heatmap dendogram clustering in R

后端 未结 3 847
你的背包
你的背包 2021-01-07 03:45

I would appreciate any info material on the dendograms (Colv, Rowv) of R\'s heatmap function. Such as how the clustering works (is it euclidean distance?). You don\'t have t

相关标签:
3条回答
  • 2021-01-07 04:02

    Rowv and Colv control whether the rows and columns of your data set should be reordered and if so how.

    The possible values for them are TRUE, NULL, FALSE, a vector of integers, or a dendrogram object.

    • In the default mode TRUE, heatmap.2 performs clustering using the hclustfun and distfun parameters. This defaults to complete linkage clustering, using a euclidean distance measure. The dendrogram is then reordered using the row/column means. You can control this by specifying different functions to hclustfun or distfun. For example to use the Manhattan distance rather than the euclidiean distance you would do:

      heatmap.2(x,...,distfun=function (y) dist(y,method = "manhattan") )
      

      check out ?dist and ?hclust. If you want to learn more about clustering you could start with "distance measures" and "agglomeration methods".

    • If Rowv/Colv is NULL or FALSE then no reordering or clustering is done and the matrix is plotted as-is.

    • If Rowv/Colv is a numeric vector, then the clustering is computed as for TRUE and the reordering of the dendrogram is done using the vector supplied to Rowv/Colv.

    • If Rowv/Colv is a dendrogram object, then this dendrogram will be used to reorder the matrix. Dendrogram objects can be generated, for example, by:

      rowDistance = dist(x, method = "manhattan")
      rowCluster = hclust(rowDistance, method = "complete")
      rowDend = as.dendrogram(rowCluster)
      rowDend = reorder(rowDend, rowMeans(x))
      

      which generates a complete clustering on a manhattan distance, ordered by row means. You can now pass rowDend to Rowv.

      heatmap.2(x,...,Rowv = rowDend)
      

      This can be useful, if for example you want to cluster the rows and columns in different ways, or use a clustering that someone else has given you, or you want to do something funky that cannot be accommodated by just specifying the hclustfun and the distfun. This is what is meant by" the dendrogram is honoured": it is used instead of what is specified by hclustfun and distfun.

    0 讨论(0)
  • 2021-01-07 04:07

    To look into how it handles Rowv/Colv exactly, you might also use body(heatmap) to display its source.

    0 讨论(0)
  • 2021-01-07 04:14

    From the manual:

    distfun : function used to compute the distance (dissimilarity) between both rows and columns. Defaults to dist.

    hclustfun : function used to compute the hierarchical clustering when Rowv or Colv are not dendrograms. Defaults to hclust. Should take as argument a result of distfun and return an object to which as.dendrogram can be applied.

    dist() has as default the euclidean distance and hclust() the complete linkage method.

    0 讨论(0)
提交回复
热议问题