Pairwise displacement vectors among set of points

前端 未结 2 1265
一生所求
一生所求 2021-01-18 00:02

I have an array of N points in d dimensions (N, d) and I\'d like to make a new array of all the displacement vectors for each pair <

相关标签:
2条回答
  • 2021-01-18 00:31

    Straight forward would be

    dis_vectors = [l - r for l, r in itertools.combinations(points, 2)]
    

    but I doubt that it is fast. Actually %timeit says:

    For 3 points:

    list : 13 us
    pdist: 24 us
    

    But already for 27 points:

    list : 798 us
    pdist: 35.2 us
    

    About how many points are we talking here?

    Another possibility something like

    import numpy
    from operator import mul
    from fractions import Fraction
    
    def binomial_coefficient(n,k):
        # credit to http://stackoverflow.com/users/226086/nas-banov
        return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
    
    def pairwise_displacements(a):
        n = a.shape[0]
        d = a.shape[1]
        c = binomial_coefficient(n, 2)
    
        out = numpy.zeros( (c, d) )
    
        l = 0
        r = l + n - 1
        for sl in range(1, n): # no point1 - point1!
            out[l:r] = a[:n-sl] - a[sl:]
            l = r
            r += n - (sl + 1)
        return out
    

    This simply "slides" the array against itself over all dimensions and performs a (broadcastable) subtraction in each step. Note that no repetition is considered and no equal pairs (e.g. point1 - point1).

    This function still performs well in the 1000 points range with 31.3ms, whereas pdist is still faster with 20.7 ms and the list comprehension takes the third place with 1.23 s.

    0 讨论(0)
  • 2021-01-18 00:32

    If you compute the full cartesian product of differences, flatten the resulting 2D array, and create your own indices to extract the upper triangle, you can get it to be "only" 6x slower than pdist:

    In [39]: points = np.random.rand(1000, 2)
    
    In [40]: %timeit pdist(points)
    100 loops, best of 3: 5.81 ms per loop
    
    In [41]: %%timeit
        ...: n = len(points)
        ...: rng = np.arange(1, n)
        ...: idx = np.arange(n *(n-1) // 2) + np.repeat(np.cumsum(rng), rng[::-1])
        ...: np.take((points[:, None] - points).reshape(-1, 2), idx, axis=0)
        ...: 
    10 loops, best of 3: 33.9 ms per loop
    

    You can also speed up your solution, creating the indices yourself, and using take instead of fancy indexing:

    In [75]: %%timeit
        ...: n = len(points)
        ...: rng = np.arange(1, n)
        ...: idx1 = np.repeat(rng - 1, rng[::-1])
        ...: idx2 = np.arange(n*(n-1)//2) + np.repeat(n - np.cumsum(rng[::-1]), rng[::-1])
        ...: np.take(points, idx1, axis=0) - np.take(points, idx2, axis=0)
        ...: 
    10 loops, best of 3: 38.8 ms per loop
    
    0 讨论(0)
提交回复
热议问题