I am new to Python and I need to implement a clustering algorithm. For that, I will need to calculate distances between the given input data.
Consider the following inpu
I suggest using pdist
and squareform
from scipy.spatial.distance
Consider the following array of points:
a = np.array([[1,2,8], [7,4,2], [9,1,7], [0,1,5], [6,4,3]])
If you want to display all distances between point [1,2,8]
and the other points:
squareform(pdist(a))
Out[1]: array([[ 0. , 8.71779789, 8.1240384 , 3.31662479, 7.34846923],
[ 8.71779789, 0. , 6.164414 , 8.18535277, 1.41421356],
[ 8.1240384 , 6.164414 , 0. , 9.21954446, 5.83095189],
[ 3.31662479, 8.18535277, 9.21954446, 0. , 7. ],
[ 7.34846923, 1.41421356, 5.83095189, 7. , 0. ]])
I you want to display the shortest distance between point [1,2,8]
and the closest point:
sorted(squareform(pdist(a))[0])[1]
Out[2]: 3.3166247903553998
[0]
being the index of your first point ([1,2,8]
)
[1]
being the index of the second minimum value (to avoid zeros)
If you want to display the index of the closest point to [1,2,8]
:
np.argsort(squareform(pdist(a))[0])[1]
Out[3]: 3