scipy.cluster.hierarchy: labels seems not in the right order, and confused by the value of the vertical axes

孤街浪徒 提交于 2019-12-02 18:25:19

问题


I know that scipy.cluster.hierarchy focused on dealing with the distance matrix. But now I have a similarity matrix... After I plot it by using Dendrogram, something weird just happens. Here is the code:

similarityMatrix = np.array(([1,0.75,0.75,0,0,0,0],
                         [0.75,1,1,0.25,0,0,0],
                         [0.75,1,1,0.25,0,0,0],
                         [0,0.25,0.25,1,0.25,0.25,0],
                         [0,0,0,0.25,1,1,0.75],
                         [0,0,0,0.25,1,1,0.75],
                         [0,0,0,0,0.75,0.75,1]))

here is the linkage method

Z_sim = sch.linkage(similarityMatrix)
plt.figure(1)
plt.title('similarity')
sch.dendrogram(
    Z_sim,
    labels=['1','2','3','4','5','6','7']
)
plt.show()

But here is the outcome:

My question is:

  1. Why is the label for this dendrogram not right?
  2. I am giving a similarity matrix for the linkage method, but I cannot fully understand what the vertical axes means. For example, as the maximum similarity is 1, why is the maximum value in the vertical axes almost 1.6?

Thank you very much for your help!


回答1:


  • linkage expects "distances", not "similarities". To convert your matrix to something like a distance matrix, you can subtract it from 1:

    dist = 1 - similarityMatrix
    
  • linkage does not accept a square distance matrix. It expects the distance data to be in "condensed" form. You can get that using scipy.spatial.distance.squareform:

    from scipy.spatial.distance import squareform
    
    dist = 1 - similarityMatrix
    condensed_dist = squareform(dist)
    Z_sim = sch.linkage(condensed_dist)
    

    (When you pass a two-dimensional array with shape (m, n) to linkage, it treats the rows as points in n-dimensional space, and computes the distances internally.)



来源:https://stackoverflow.com/questions/40700628/scipy-cluster-hierarchy-labels-seems-not-in-the-right-order-and-confused-by-th

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!