I am trying to find the fastest way to perform the following pairwise distance calculation in Python. I want to use the distances to rank a list_of_objects
by their
The normalization step, where you divide pairwise distances by the max value, seems non-standard, and may make it hard to find a ready-made function that will do exactly what you are after. It is pretty easy though to do it yourself. A starting point is to turn your list_of_objects
into an array:
>>> obj_arr = np.array(list_of_objects)
>>> obj_arr.shape
(3L, 4L)
You can then get the pairwise distances using broadcasting. This is a little inefficient, because it is not taking advantage of the symettry of your metric, and is calculating every distance twice:
>>> dists = np.abs(obj_arr - obj_arr[:, None])
>>> dists.shape
(3L, 3L, 4L)
Normalizing is very easy to do:
>>> dists /= dists.max(axis=(0, 1))
And your final weighing can be done in a variety of ways, you may want to benchmark which is fastest:
>>> dists.dot([1, 1, 1, 1])
array([[ 0. , 1.93813131, 2.21542674],
[ 1.93813131, 0. , 3.84644195],
[ 2.21542674, 3.84644195, 0. ]])
>>> np.einsum('ijk,k->ij', dists, [1, 1, 1, 1])
array([[ 0. , 1.93813131, 2.21542674],
[ 1.93813131, 0. , 3.84644195],
[ 2.21542674, 3.84644195, 0. ]])