问题
I have a data frame which I am trying to cluster. I am using hclust
right now. In my data frame, there is a FLAG
column which I would like to color the dendrogram by. By the resulting picture, I am trying to figure out similarities among various FLAG
categories. My data frame looks something like this:
FLAG ColA ColB ColC ColD
I am clustering on colA
, colB
, colC
and colD
. I would like to cluster these and color them according to FLAG
categories. Ex - color red if 1, blue if 0 (I have only two categories). Right now I am using the vanilla version of cluster plotting.
hc<-hclust(dist(data[2:5]),method='complete')
plot(hc)
Any help in this regard would be highly appreciated.
回答1:
If you want to color the branches of a dendrogram based on a certain variable then the following code (largely taken from the help for the dendrapply function) should give the desired result:
x<-1:100
dim(x)<-c(10,10)
groups<-sample(c("red","blue"), 10, replace=TRUE)
x.clust<-as.dendrogram(hclust(dist(x)))
local({
colLab <<- function(n) {
if(is.leaf(n)) {
a <- attributes(n)
i <<- i+1
attr(n, "edgePar") <-
c(a$nodePar, list(col = mycols[i], lab.font= i%%3))
}
n
}
mycols <- groups
i <- 0
})
x.clust.dend <- dendrapply(x.clust, colLab)
plot(x.clust.dend)
回答2:
I think Arhopala's answer is good. I took the liberty to take a step further, and added the function assign_values_to_leaves_edgePar
to the dendextend package (starting from version 0.17.2, which is now on github). This version of the function is a bit more robust and flexible from Arhopala's answer since:
- It is a general function which can work in different problems/settings
- The function can deal with other edgePar parameters (col, lwd, lty)
- The function offers recycling of partial vectors, and various warnings massages when needed.
To install the dendextend package you can use install.packages('dendextend')
, but for the latest version, use the following code:
require2 <- function (package, ...) {
if (!require(package)) install.packages(package); library(package)
}
## require2('installr')
## install.Rtools() # run this if you are using Windows and don't have Rtools installed (you must have it for devtools)
# Load devtools:
require2("devtools")
devtools::install_github('talgalili/dendextend')
Now that we have dendextend installed, here is a second take on Arhopala's answer:
x<-1:100
dim(x)<-c(10,10)
set.seed(1)
groups<-sample(c("red","blue"), 10, replace=TRUE)
x.clust<-as.dendrogram(hclust(dist(x)))
x.clust.dend <- x.clust
x.clust.dend <- assign_values_to_leaves_edgePar(x.clust.dend, value = groups, edgePar = "col") # add the colors.
x.clust.dend <- assign_values_to_leaves_edgePar(x.clust.dend, value = 3, edgePar = "lwd") # make the lines thick
plot(x.clust.dend)
Here is the result:
p.s.: I personally prefer using pipes for this type of coding (which will give the same result as above, but is easier to read):
x.clust <- x %>% dist %>% hclust %>% as.dendrogram
x.clust.dend <- x.clust %>%
assign_values_to_leaves_edgePar(value = groups, edgePar = "col") %>% # add the colors.
assign_values_to_leaves_edgePar(value = 3, edgePar = "lwd") # make the lines thick
plot(x.clust.dend)
来源:https://stackoverflow.com/questions/23328663/color-branches-of-dendrogram-using-an-existing-column