I have three matrices to compare. Each of them is 5x6. I originally wanted to use hierarchical clustering to cluster the matrices, such that the most similar matrices are groupe
The first argument of linkage should not be the square distance matrix. It must be the condensed distance matrix. In your case, that would be np.array([2.0, 3.8459253727671276e-16, 2])
. You can convert from the square distance matrix to the condensed form using scipy.spatial.distance.squareform
If you pass a two dimensional array to linkage
with shape (m, n)
, it treats it as an array of m
points in n
-dimensional space and it computes the distances of those points itself. That's why you didn't get an error when you passed in the square distance matrix--but you got an incorrect plot. (This is an undocumented "feature" of linkage
.)
Also note that because the distance 3.8e-16 is so small, the horizontal line associated with the link between points 0 and 2 might not be visible in the plot--it is on the x axis.
Here's a modified version of your script. For this example, I've changed that tiny distance to 0.1, so the associated cluster is not obscured by the x axis.
import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import squareform
import matplotlib.pyplot as plt
mat = np.array([[0.0, 2.0, 0.1], [2.0, 0.0, 2.0], [0.1, 2.0, 0.0]])
dists = squareform(mat)
linkage_matrix = linkage(dists, "single")
dendrogram(linkage_matrix, labels=["0", "1", "2"])
plt.title("test")
plt.show()
Here is the plot created by the script: