问题
In python, I have an N by N distance matrix dmat, where dmat[i,j] encodes the distance from entity i to entity j. I'd like to view a dendrogram. I did:
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pylab as plt
labels=[name of entity 1,2,3,...]
Z=linkage(dmat)
dn=dendrogram(Z,labels=labels)
plt.show()
But the label ordering looks wrong. There are entities which are very close from dmat, but that's not reflected in the dendrogram. What's going on?
回答1:
The first argument to linkage must be either the distances in condensed format, or the array of points being clustered. If you pass the square (N x N) distance matrix, linkage
interprets it as N points in N-dimensional space.
You can convert from your square matrix to the condensed form with scipy.spatial.distance.squareform.
Add this to the beginning of your file
from scipy.spatial.distance import squareform
and replace this
Z=linkage(dmat)
with
Z = linkage(squareform(dmat))
来源:https://stackoverflow.com/questions/48331537/label-ordering-in-scipy-dendrogram