I have an array of N
points in d
dimensions (N, d)
and I\'d like to make a new array of all the displacement vectors for each pair <
Straight forward would be
dis_vectors = [l - r for l, r in itertools.combinations(points, 2)]
but I doubt that it is fast. Actually %timeit
says:
For 3 points:
list : 13 us
pdist: 24 us
But already for 27 points:
list : 798 us
pdist: 35.2 us
About how many points are we talking here?
Another possibility something like
import numpy
from operator import mul
from fractions import Fraction
def binomial_coefficient(n,k):
# credit to http://stackoverflow.com/users/226086/nas-banov
return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
def pairwise_displacements(a):
n = a.shape[0]
d = a.shape[1]
c = binomial_coefficient(n, 2)
out = numpy.zeros( (c, d) )
l = 0
r = l + n - 1
for sl in range(1, n): # no point1 - point1!
out[l:r] = a[:n-sl] - a[sl:]
l = r
r += n - (sl + 1)
return out
This simply "slides" the array against itself over all dimensions and performs a (broadcastable) subtraction in each step. Note that no repetition is considered and no equal pairs (e.g. point1 - point1).
This function still performs well in the 1000 points range with 31.3ms
, whereas pdist
is still faster with 20.7 ms
and the list comprehension takes the third place with 1.23 s
.