Python calculate lots of distances quickly

后端 未结 4 1089
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-06 08:02

I have an input of 36,742 points which means if I wanted to calculate the lower triangle of a distance matrix (using the vincenty approximation) I would need to generate 36,742*

4条回答
  •  梦如初夏
    2021-02-06 08:59

    This sounds like a classic use case for k-D trees.

    If you first transform your points into Euclidean space then you can use the query_pairs method of scipy.spatial.cKDTree:

    from scipy.spatial import cKDTree
    
    tree = cKDTree(data)
    # where data is (nshops, ndim) containing the Euclidean coordinates of each shop
    # in units of km
    
    pairs = tree.query_pairs(50, p=2)   # 50km radius, L2 (Euclidean) norm
    

    pairs will be a set of (i, j) tuples corresponding to the row indices of pairs of shops that are ≤50km from each other.


    The output of tree.sparse_distance_matrix is a scipy.sparse.dok_matrix. Since the matrix will be symmetric and you're only interested in unique row/column pairs, you could use scipy.sparse.tril to zero out the upper triangle, giving you a scipy.sparse.coo_matrix. From there you can access the nonzero row and column indices and their corresponding distance values via the .row, .col and .data attributes:

    from scipy import sparse
    
    tree_dist = tree.sparse_distance_matrix(tree, max_distance=10000, p=2)
    udist = sparse.tril(tree_dist, k=-1)    # zero the main diagonal
    ridx = udist.row    # row indices
    cidx = udist.col    # column indices
    dist = udist.data   # distance values
    

提交回复
热议问题