Alternative to scipy.cluster.hierarchy.cut_tree()

六月ゝ 毕业季﹏ 提交于 2019-12-10 17:12:12

问题


I was doing an agglomerative hierarchical clustering experiment in Python 3 and I found scipy.cluster.hierarchy.cut_tree() is not returning the requested number of clusters for some input linkage matrices. So, by now I know there is a bug in the cut_tree() function (as described here).

However, I need to be able to get a flat clustering with an assignment of k different labels to my datapoints. Do you know the algorithm to get a flat clustering with k labels from an arbitrary input linkage matrix Z? My question boils down to: how can I compute what cut_tree() is computing from scratch with no bugs?

You can test your code with this dataset.

from scipy.cluster.hierarchy import linkage, is_valid_linkage
from scipy.spatial.distance import pdist

## Load dataset
X = np.load("dataset.npy")

## Hierarchical clustering
dists = pdist(X)
Z = linkage(dists, method='centroid', metric='euclidean')

print(is_valid_linkage(Z))

## Now let's say we want the flat cluster assignement with 10 clusters.
#  If cut_tree() was working we would do
from scipy.cluster.hierarchy import cut_tree
cut = cut_tree(Z, 10)

Sidenote: An alternative approach could maybe be using rpy2's cutree() as a substitute for scipy's cut_tree(), but I never used it. What do you think?


回答1:


One way to obtain k flat clusters is to use scipy.cluster.hierarchy.fcluster with criterion='maxclust':

from scipy.cluster.hierarchy import fcluster
clust = fcluster(Z, k, criterion='maxclust')


来源:https://stackoverflow.com/questions/46869640/alternative-to-scipy-cluster-hierarchy-cut-tree

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!