Calculate weighted pairwise distance matrix in Python

前端未结

关注

 2  1752

I am trying to find the fastest way to perform the following pairwise distance calculation in Python. I want to use the distances to rank a list_of_objects by their

相关标签:

2条回答

后悔当初

2021-02-14 12:40

scipy.spatial.distance is the module you'll want to have a look at. It has a lot of different norms that can be easily applied.

I'd recommend using the weighted Monkowski Metrik

Weighted Minkowski Metrik

You can do pairwise distance calculation by using the pdist method from this package.

E.g.

import numpy as np
from scipy.spatial.distance import pdist, wminkowski, squareform

object_1 = [0.2, 4.5, 198, 0.003]
object_2 = [0.3, 2.0, 999, 0.001]
object_3 = [0.1, 9.2, 321, 0.023]
list_of_objects = [object_1, object_2, object_3]

# make a 3x4 array from the list of objects
X = np.array(list_of_objects)

#calculate pairwise distances, using weighted Minkowski norm
distances = pdist(X,wminkowski,2, [1,1,1,10])

#make a square matrix from result
distances_as_2d_matrix = squareform(distances)

print distances
print distances_as_2d_matrix

This will print

[ 801.00390786  123.0899671   678.0382942 ]
[[   0.          801.00390786  123.0899671 ]
 [ 801.00390786    0.          678.0382942 ]
 [ 123.0899671   678.0382942     0.        ]]

0 讨论(0)

后悔当初

2021-02-14 12:41
The normalization step, where you divide pairwise distances by the max value, seems non-standard, and may make it hard to find a ready-made function that will do exactly what you are after. It is pretty easy though to do it yourself. A starting point is to turn your list_of_objects into an array:
```
>>> obj_arr = np.array(list_of_objects)
>>> obj_arr.shape
(3L, 4L)
```
You can then get the pairwise distances using broadcasting. This is a little inefficient, because it is not taking advantage of the symettry of your metric, and is calculating every distance twice:
```
>>> dists = np.abs(obj_arr - obj_arr[:, None])
>>> dists.shape
(3L, 3L, 4L)
```
Normalizing is very easy to do:
```
>>> dists /= dists.max(axis=(0, 1))
```
And your final weighing can be done in a variety of ways, you may want to benchmark which is fastest:
```
>>> dists.dot([1, 1, 1, 1])
array([[ 0.        ,  1.93813131,  2.21542674],
       [ 1.93813131,  0.        ,  3.84644195],
       [ 2.21542674,  3.84644195,  0.        ]])
>>> np.einsum('ijk,k->ij', dists, [1, 1, 1, 1])
array([[ 0.        ,  1.93813131,  2.21542674],
       [ 1.93813131,  0.        ,  3.84644195],
       [ 2.21542674,  3.84644195,  0.        ]])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...