问题
The following code generates a simple hierarchical cluster dendrogram with 10 leaf nodes:
import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt
X = scipy.randn(10,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
P =sch.dendrogram(Z)
plt.show()
I generate three flat clusters like so:
T = sch.fcluster(Z, 3, 'maxclust')
# array([3, 1, 1, 2, 2, 2, 2, 2, 1, 2])
However, I'd like to see the cluster labels 1,2,3 on the dendrogram. It's easy for me to visualize with just 10 leaf nodes and three clusters, but when I have 1000 nodes and 10 clusters, I can't see what's going on.
How do I show the cluster numbers on the dendrogram? I'm open to other packages. Thanks.
回答1:
Here is a solution that appropriately colors the clusters and labels the leaves of the dendrogram with the appropriate cluster name (leaves are labeled: 'point number, cluster number'). These techniques can be used independently or together. I modified your original example to include both:
import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt
n=10
k=3
X = scipy.randn(n,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
T = sch.fcluster(Z, k, 'maxclust')
# calculate labels
labels=list('' for i in range(n))
for i in range(n):
labels[i]=str(i)+ ',' + str(T[i])
# calculate color threshold
ct=Z[-(k-1),2]
#plot
P =sch.dendrogram(Z,labels=labels,color_threshold=ct)
plt.show()
来源:https://stackoverflow.com/questions/19778167/matching-dendrogram-with-cluster-number-in-pythons-scipy-cluster-hierarchy