Dendrogram or Other Plot from Distance Matrix

后端 未结 1 1911
不知归路
不知归路 2021-02-06 03:20

I have three matrices to compare. Each of them is 5x6. I originally wanted to use hierarchical clustering to cluster the matrices, such that the most similar matrices are groupe

相关标签:
1条回答
  • 2021-02-06 03:59

    The first argument of linkage should not be the square distance matrix. It must be the condensed distance matrix. In your case, that would be np.array([2.0, 3.8459253727671276e-16, 2]). You can convert from the square distance matrix to the condensed form using scipy.spatial.distance.squareform

    If you pass a two dimensional array to linkage with shape (m, n), it treats it as an array of m points in n-dimensional space and it computes the distances of those points itself. That's why you didn't get an error when you passed in the square distance matrix--but you got an incorrect plot. (This is an undocumented "feature" of linkage.)

    Also note that because the distance 3.8e-16 is so small, the horizontal line associated with the link between points 0 and 2 might not be visible in the plot--it is on the x axis.

    Here's a modified version of your script. For this example, I've changed that tiny distance to 0.1, so the associated cluster is not obscured by the x axis.

    import numpy as np
    
    from scipy.cluster.hierarchy import dendrogram, linkage
    from scipy.spatial.distance import squareform
    
    import matplotlib.pyplot as plt
    
    
    mat = np.array([[0.0, 2.0, 0.1], [2.0, 0.0, 2.0], [0.1, 2.0, 0.0]])
    dists = squareform(mat)
    linkage_matrix = linkage(dists, "single")
    dendrogram(linkage_matrix, labels=["0", "1", "2"])
    plt.title("test")
    plt.show()
    

    Here is the plot created by the script:

    0 讨论(0)
提交回复
热议问题