I have a distance matrix n*n M
where M_ij
is the distance between object_i
and object_j
. So as expected, it takes the fol
It seems that indeed we cannot directly pass the redundant square matrix in, although the documentation claims we can do so.
To benefit anyone who faces the same problem in the future, I write my solution as an additional answer here. So the copy-and-paste guys can just proceed with the clustering.
Use the following snippet to condense the matrix and happily proceed.
import scipy.spatial.distance as ssd
# convert the redundant n*n square matrix form into a condensed nC2 array
distArray = ssd.squareform(distMatrix) # distArray[{n choose 2}-{n-i choose 2} + (j-i-1)] is the distance between points i and j
Please correct me if I am wrong.
For now you should pass in the 'condensed distance matrix', i.e. just the upper triangle of the distance matrix in vector form:
y = M[np.triu_indices(n,1)]
From the discussion of @hongbo-zhu-cn's pull request it looks as though the solution will be to add an extra keyword argument to the linkage
function that will allow the user to explicitly specify that they are passing in an n x n distance matrix rather than an m x n observation matrix.