How to know about group information in cluster analysis (hierarchical)?

南笙酒味 提交于 2019-12-05 05:47:50

问题


I have problem about group in cluster analysis(hierarchical cluster). As example, this is the dendrogram of complete linkage of Iris data set.

After I use

> table(cutree(hc, 3), iris$Species)

This is the output:

  setosa versicolor virginica
1     50          0         0
2      0         23        49
3      0         27         1

I have read in one statistical website that, object 1 in the data always belongs to group/cluster 1. From the output above, we know that setosa is in group 1. Then, how I am going to know about the other two species. How do they fall into either group 2 or 3. How did it happen. Perhaps there is a calculation I need to know?


回答1:


I'm guessing that you're using this to create that image that doesn't appear to be there at the moment.

> lmbjck <- cutree(hclust(dist(iris[1:4], "euclidean")), 3)
> table(lmbjck, iris$Species)

lmbjck setosa versicolor virginica
     1     50          0         0
     2      0         23        49
     3      0         27         1

Dist is created from measurements of plants from three different species with identical column and row names.

> iris.dist <- dist(iris[1:4], "euclidean")
> identical(rownames(iris.dist), colnames(iris.dist))
[1] TRUE

That object is passed on to hclust which constructs a tree and cut it into three pieces. Object iris.order holds the order by which the dendrogram is drawn. Original order is preserved, the tree is drawn based on this ordering.

> iris.hclust <- hclust(iris.dist)
> iris.cutree <- cutree(iris.hclust, 3)
> iris.order <- iris.hclust$order

Here's proof. I've put together original Species designations, ordered species designations as they can be seen in the dendrogram, order number and group from a cutree function.

> data.frame(original = iris$Species, ordered = iris$Species[iris.order],
             order.num = iris.order, cutree = iris.cutree)

      original    ordered order.num cutree
1       setosa  virginica       108      1
2       setosa  virginica       131      1
3       setosa  virginica       103      1
4       setosa  virginica       126      1
5       setosa  virginica       130      1
6       setosa  virginica       119      1
    ...
103  virginica     setosa        31      2
104  virginica     setosa        26      2
105  virginica     setosa        10      2
106  virginica     setosa        35      2
107  virginica     setosa        13      3
108  virginica     setosa         2      2
    ...

Let's look at the output. If you look at the first line, under order.num there's number 108. This means that for this item (first item on the left side of the dendrogram) comes from row 108. Skim down to line 108, and you can see that the original Species is indeed virginica. Cutree assigns this to group 1. Let's look at line 3. Under order.num you can see that this item comes from row 103. Again, if you go down and check the original species in row 103, it's (still) virginica. I'll make it an exercise for you to check other (random) rows and convince yourself that the order for constructing the table at the beginning is preserved. Ergo, the table should thus be correct.



来源:https://stackoverflow.com/questions/11489696/how-to-know-about-group-information-in-cluster-analysis-hierarchical

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!