Use Distance Matrix in scipy.cluster.hierarchy.linkage()?

后端 未结 2 1869
你的背包
你的背包 2020-11-29 03:19

I have a distance matrix n*n M where M_ij is the distance between object_i and object_j. So as expected, it takes the fol

相关标签:
2条回答
  • 2020-11-29 03:54

    It seems that indeed we cannot directly pass the redundant square matrix in, although the documentation claims we can do so.

    To benefit anyone who faces the same problem in the future, I write my solution as an additional answer here. So the copy-and-paste guys can just proceed with the clustering.

    Use the following snippet to condense the matrix and happily proceed.

    import scipy.spatial.distance as ssd
    # convert the redundant n*n square matrix form into a condensed nC2 array
        distArray = ssd.squareform(distMatrix) # distArray[{n choose 2}-{n-i choose 2} + (j-i-1)] is the distance between points i and j
    

    Please correct me if I am wrong.

    0 讨论(0)
  • 2020-11-29 03:55

    For now you should pass in the 'condensed distance matrix', i.e. just the upper triangle of the distance matrix in vector form:

    y = M[np.triu_indices(n,1)]
    

    From the discussion of @hongbo-zhu-cn's pull request it looks as though the solution will be to add an extra keyword argument to the linkage function that will allow the user to explicitly specify that they are passing in an n x n distance matrix rather than an m x n observation matrix.

    0 讨论(0)
提交回复
热议问题