Computing Euclidean distance for numpy in python

前端 未结 3 1672
伪装坚强ぢ
伪装坚强ぢ 2020-12-09 01:03

I am new to Python so this question might look trivia. However, I did not find a similar case to mine. I have a matrix of coordinates for 20 nodes. I want to compute the euc

相关标签:
3条回答
  • 2020-12-09 01:10

    There are much, much faster alternatives to using nested for loops for this. I'll show you two different approaches - the first will be a more general method that will introduce you to broadcasting and vectorization, and the second uses a more convenient scipy library function.


    1. The general way, using broadcasting & vectorization

    One of the first things I'd suggest doing is switching to using np.array rather than np.matrix. Arrays are preferred for a number of reasons, most importantly because they can have >2 dimensions, and they make element-wise multiplication much less awkward.

    import numpy as np
    
    ncoord = np.array(ncoord)
    

    With an array, we can eliminate the nested for loops by inserting a new singleton dimension and broadcasting the subtraction over it:

    # indexing with None (or np.newaxis) inserts a new dimension of size 1
    print(ncoord[:, :, None].shape)
    # (20, 2, 1)
    
    # by making the 'inner' dimensions equal to 1, i.e. (20, 2, 1) - (1, 2, 20),
    # the subtraction is 'broadcast' over every pair of rows in ncoord
    xydiff = ncoord[:, :, None] - ncoord[:, :, None].T
    
    print(xydiff.shape)
    # (20, 2, 20)
    

    This is equivalent to looping over every pair of rows using nested for loops, but much, much faster!

    xydiff2 = np.zeros((20, 2, 20), dtype=xydiff.dtype)
    for ii in range(20):
        for jj in range(20):
            for kk in range(2):
                xydiff[ii, kk, jj] = ncoords[ii, kk] - ncoords[jj, kk]
    
    # check that these give the same result
    print(np.all(xydiff == xydiff2))
    # True
    

    The rest we can also do using vectorized operations:

    # we square the differences and sum over the 'middle' axis, equivalent to
    # computing (x_i - x_j) ** 2 + (y_i - y_j) ** 2
    ssdiff = (xydiff * xydiff).sum(1)
    
    # finally we take the square root
    D = np.sqrt(ssdiff)
    

    The whole thing could be done in one line like this:

    D = np.sqrt(((ncoord[:, :, None] - ncoord[:, :, None].T) ** 2).sum(1))
    

    2. The lazy way, using pdist

    It turns out that there's already a fast and convenient function for computing all pairwise distances: scipy.spatial.distance.pdist.

    from scipy.spatial.distance import pdist, squareform
    
    d = pdist(ncoord)
    
    # pdist just returns the upper triangle of the pairwise distance matrix. to get
    # the whole (20, 20) array we can use squareform:
    
    print(d.shape)
    # (190,)
    
    D2 = squareform(d)
    print(D2.shape)
    # (20, 20)
    
    # check that the two methods are equivalent
    print np.all(D == D2)
    # True
    
    0 讨论(0)
  • 2020-12-09 01:26
    for i in range(0, n):
        for j in range(i+1, n):
            c[i, j] = math.sqrt((ncoord[i, 0] - ncoord[j, 0])**2 
            + (ncoord[i, 1] - ncoord[j, 1])**2)
    

    Note: ncoord[i, j] is not the same as ncoord[i][j] for a Numpy matrix. This appears to be the source of confusion. If ncoord is a Numpy array then they will give the same result.

    For a Numpy matrix, ncoord[i] returns the ith row of ncoord, which itself is a Numpy matrix object with shape 1 x 2 in your case. Therefore, ncoord[i][j] actually means: take the ith row of ncoord and take the jth row of that 1 x 2 matrix. This is where your indexing problems comes about when j > 0.

    Regarding your comments on assigning to c[i][j] "working", it shouldn't. At least on my build of Numpy 1.9.1 it shouldn't work if your indices i and j iterates up to n.

    As an aside, remember to add the transpose of the matrix c to itself.

    It is recommended to use Numpy arrays instead of matrix. See this post.

    If your coordinates are stored as a Numpy array, then pairwise distance can be computed as:

    from scipy.spatial.distance import pdist
    
    pairwise_distances = pdist(ncoord, metric="euclidean", p=2)
    

    or simply

    pairwise_distances = pdist(ncoord)
    

    since the default metric is "euclidean", and default "p" is 2.

    In a comment below I mistakenly mentioned that the result of pdist is a n x n matrix. To get a n x n matrix, you will need to do the following:

    from scipy.spatial.distance import pdist, squareform
    
    pairwise_distances = squareform(pdist(ncoord))
    

    or

    from scipy.spatial.distance import cdist
    
    pairwise_distances = cdist(ncoord, ncoord)
    
    0 讨论(0)
  • 2020-12-09 01:29

    What I figure you wanted to do: You said you wanted a 20 by 20 matrix... but the one you coded is triangular.

    Thus I coded a complete 20x20 matrix instead.

    distances = []
    for i in range(len(ncoord)):
        given_i = []
        for j in range(len(ncoord)):
            d_val = math.sqrt((ncoord[i, 0]-ncoord[j,0])**2+(ncoord[i,1]-ncoord[j,1])**2)
            given_i.append(d_val)
    
        distances.append(given_i)
    
        # distances[i][j] = distance from i to j
    

    SciPy way:

    from scipy.spatial.distance import cdist
    # Isn't scipy nice - can also use pdist... works in the same way but different recall method.
    distances = cdist(ncoord, ncoord, 'euclidean')
    
    0 讨论(0)
提交回复
热议问题