Efficiently Calculating a Euclidean Distance Matrix Using Numpy

后端 未结 4 468
我在风中等你
我在风中等你 2020-11-30 02:31

I have a set of points in 2-dimensional space and need to calculate the distance from each point to each other point.

I have a relatively small number of points, ma

相关标签:
4条回答
  • 2020-11-30 03:03

    Jake Vanderplas gives this example using broadcasting in Python Data Science Handbook, which is very similar to what @shx2 proposed.

    import numpy as np
    rand = random.RandomState(42)
    X = rand.rand(3, 2)  
    dist_sq = np.sum((X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2, axis = -1)
    
    dist_sq
    array([[0.        , 0.18543317, 0.81602495],
           [0.18543317, 0.        , 0.22819282],
           [0.81602495, 0.22819282, 0.        ]])
    
    0 讨论(0)
  • 2020-11-30 03:13

    Here is how you can do it using numpy:

    import numpy as np
    
    x = np.array([0,1,2])
    y = np.array([2,4,6])
    
    # take advantage of broadcasting, to make a 2dim array of diffs
    dx = x[..., np.newaxis] - x[np.newaxis, ...]
    dy = y[..., np.newaxis] - y[np.newaxis, ...]
    dx
    => array([[ 0, -1, -2],
              [ 1,  0, -1],
              [ 2,  1,  0]])
    
    # stack in one array, to speed up calculations
    d = np.array([dx,dy])
    d.shape
    => (2, 3, 3)
    

    Now all is left is computing the L2-norm along the 0-axis (as discussed here):

    (d**2).sum(axis=0)**0.5
    => array([[ 0.        ,  2.23606798,  4.47213595],
              [ 2.23606798,  0.        ,  2.23606798],
              [ 4.47213595,  2.23606798,  0.        ]])
    
    0 讨论(0)
  • 2020-11-30 03:21

    If you don't need the full distance matrix, you will be better off using kd-tree. Consider scipy.spatial.cKDTree or sklearn.neighbors.KDTree. This is because a kd-tree kan find k-nearnest neighbors in O(n log n) time, and therefore you avoid the O(n**2) complexity of computing all n by n distances.

    0 讨论(0)
  • 2020-11-30 03:23

    You can take advantage of the complex type :

    # build a complex array of your cells
    z = np.array([complex(c.m_x, c.m_y) for c in cells])
    

    First solution

    # mesh this array so that you will have all combinations
    m, n = np.meshgrid(z, z)
    # get the distance via the norm
    out = abs(m-n)
    

    Second solution

    Meshing is the main idea. But numpy is clever, so you don't have to generate m & n. Just compute the difference using a transposed version of z. The mesh is done automatically :

    out = abs(z[..., np.newaxis] - z)
    

    Third solution

    And if z is directly set as a 2-dimensional array, you can use z.T instead of the weird z[..., np.newaxis]. So finally, your code will look like this :

    z = np.array([[complex(c.m_x, c.m_y) for c in cells]]) # notice the [[ ... ]]
    out = abs(z.T-z)
    

    Example

    >>> z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])
    >>> abs(z.T-z)
    array([[ 0.        ,  2.23606798,  4.12310563],
           [ 2.23606798,  0.        ,  4.24264069],
           [ 4.12310563,  4.24264069,  0.        ]])
    

    As a complement, you may want to remove duplicates afterwards, taking the upper triangle :

    >>> np.triu(out)
    array([[ 0.        ,  2.23606798,  4.12310563],
           [ 0.        ,  0.        ,  4.24264069],
           [ 0.        ,  0.        ,  0.        ]])
    

    Some benchmarks

    >>> timeit.timeit('abs(z.T-z)', setup='import numpy as np;z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])')
    4.645645342274779
    >>> timeit.timeit('abs(z[..., np.newaxis] - z)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
    5.049334864854522
    >>> timeit.timeit('m, n = np.meshgrid(z, z); abs(m-n)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
    22.489568296184686
    
    0 讨论(0)
提交回复
热议问题