问题
I have problem about group in cluster analysis(hierarchical cluster). As example, this is the dendrogram of complete linkage of Iris data set.
After I use
> table(cutree(hc, 3), iris$Species)
This is the output:
setosa versicolor virginica
1 50 0 0
2 0 23 49
3 0 27 1
I have read in one statistical website that, object 1 in the data always belongs to group/cluster 1. From the output above, we know that setosa is in group 1. Then, how I am going to know about the other two species. How do they fall into either group 2 or 3. How did it happen. Perhaps there is a calculation I need to know?
回答1:
I'm guessing that you're using this to create that image that doesn't appear to be there at the moment.
> lmbjck <- cutree(hclust(dist(iris[1:4], "euclidean")), 3)
> table(lmbjck, iris$Species)
lmbjck setosa versicolor virginica
1 50 0 0
2 0 23 49
3 0 27 1
Dist is created from measurements of plants from three different species with identical column and row names.
> iris.dist <- dist(iris[1:4], "euclidean")
> identical(rownames(iris.dist), colnames(iris.dist))
[1] TRUE
That object is passed on to hclust which constructs a tree and cut it into three pieces. Object iris.order
holds the order by which the dendrogram is drawn. Original order is preserved, the tree is drawn based on this ordering.
> iris.hclust <- hclust(iris.dist)
> iris.cutree <- cutree(iris.hclust, 3)
> iris.order <- iris.hclust$order
Here's proof. I've put together original Species
designations, ordered species designations as they can be seen in the dendrogram, order number and group from a cutree function.
> data.frame(original = iris$Species, ordered = iris$Species[iris.order],
order.num = iris.order, cutree = iris.cutree)
original ordered order.num cutree
1 setosa virginica 108 1
2 setosa virginica 131 1
3 setosa virginica 103 1
4 setosa virginica 126 1
5 setosa virginica 130 1
6 setosa virginica 119 1
...
103 virginica setosa 31 2
104 virginica setosa 26 2
105 virginica setosa 10 2
106 virginica setosa 35 2
107 virginica setosa 13 3
108 virginica setosa 2 2
...
Let's look at the output. If you look at the first line, under order.num
there's number 108. This means that for this item (first item on the left side of the dendrogram) comes from row 108. Skim down to line 108, and you can see that the original Species
is indeed virginica
. Cutree assigns this to group 1
. Let's look at line 3. Under order.num
you can see that this item comes from row 103. Again, if you go down and check the original species in row 103, it's (still) virginica
. I'll make it an exercise for you to check other (random) rows and convince yourself that the order for constructing the table at the beginning is preserved. Ergo, the table should thus be correct.
来源:https://stackoverflow.com/questions/11489696/how-to-know-about-group-information-in-cluster-analysis-hierarchical