问题
I've used hclust to generate a cluster dendrogram of some data, but I need to isolate all the paired clusters, i.e. all the clusters that comprise just 2 pieces of data (the first ones to be clustered together), even if they might be clustered with other data on a "higher" branch. Does anyone know how I can do that?
I've highlighted the clusters I want to isolate in the attached image, hopefully that explains it better.
I'd like to be able to isolate all the paired data in those clusters in such a way to be able to compare the clusters on their contents. For example to see which of them contain a particular type of data.
回答1:
FWIW, you could extract the "forks" like this:
hc <- hclust(dist(USArrests), "ave")
plot(hc)
res <- list()
invisible(dendrapply(as.dendrogram(hc), function(x) {
if (attr(x, "members")==2)
if (all(sapply(x[1:2], is.leaf)))
res <<- c(res, list(c(attr(x[[1]], "label"), attr(x[[2]], "label"))))
x
}))
head( do.call(rbind, res) )
# [,1] [,2]
# [1,] "Florida" "North Carolina"
# [2,] "Arizona" "New Mexico"
# [3,] "Alabama" "Louisiana"
# [4,] "Illinois" "New York"
# [5,] "Michigan" "Nevada"
# [6,] "Mississippi" "South Carolina"
(just the first 6 rows of the result)
来源:https://stackoverflow.com/questions/35866470/r-isolate-clusters-with-specific-characteristics-in-hclust