I am new to Python and I need to implement a clustering algorithm. For that, I will need to calculate distances between the given input data.
Consider the following inpu
Use np.linalg.norm
combined with broadcasting (numpy outer subtraction), you can do:
np.linalg.norm(a - a[:,None], axis=-1)
a[:,None]
insert a new axis into a
, a - a[:,None]
will then do a row by row subtraction due to broadcasting. np.linalg.norm
calculates the np.sqrt(np.sum(np.square(...)))
over the last axis:
a = np.array([[1,2,8],
[7,4,2],
[9,1,7],
[0,1,5],
[6,4,3]])
np.linalg.norm(a - a[:,None], axis=-1)
#array([[ 0. , 8.71779789, 8.1240384 , 3.31662479, 7.34846923],
# [ 8.71779789, 0. , 6.164414 , 8.18535277, 1.41421356],
# [ 8.1240384 , 6.164414 , 0. , 9.21954446, 5.83095189],
# [ 3.31662479, 8.18535277, 9.21954446, 0. , 7. ],
# [ 7.34846923, 1.41421356, 5.83095189, 7. , 0. ]])
The elements [0,1]
, [0,2]
for instance correspond to:
np.sqrt(np.sum((a[0] - a[1]) ** 2))
# 8.717797887081348
np.sqrt(np.sum((a[0] - a[2]) ** 2))
# 8.1240384046359608
respectively.