Pruning dendrogram in scipy (hierarchical clustering)

牧云@^-^@ 提交于 2019-12-21 03:52:54

问题


I have a distance matrix with about 5000 entries, and use scipy's hierarchical clustering methods to cluster the matrix. The code I use for this is the following snippet:

Y = fastcluster.linkage(D, method='centroid') # D-distance matrix
Z1 = sch.dendrogram(Y,truncate_mode='level', p=7,show_contracted=True)

Since the dendrogram will become rather dense with all this data, I use the truncate_mode to prune it a bit. All of this works, but I wonder how I can find out which of the original 5000 entries belong to a particular branch in the dendrogram.

I tried using

 leaves = sch.leaves_list(Y)

to get a list of leaves, but this uses the linkage output as indata, and while I can see the correspondence between the pruned dendrogram and the leaves-list, it becomes a bit cumbersome to map original entries manually to the dendrogram.

To summarize: Is there a way of listing all the original entries in the distance matrix that belongs to a branch in a pruned dendrogram? Or are there other methods of doing this that I am not aware of.

Thanks


回答1:


One of the dictionary data-structures returned by scipy.cluster.hierarchy.dendrogram has the key ivl, that the documentation describes as:

a list of labels corresponding to the leaf nodes

You can supply custom labels (using labels=<array of lables>) as input to the dendrogram function but by default, they are just indices of the original observation. By comparing the original labels/indices and Z1['ivl'], you can determine what the original entries were.



来源:https://stackoverflow.com/questions/10305111/pruning-dendrogram-in-scipy-hierarchical-clustering

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!