I have an array of N
points in d
dimensions (N, d)
and I\'d like to make a new array of all the displacement vectors for each pair <
If you compute the full cartesian product of differences, flatten the resulting 2D array, and create your own indices to extract the upper triangle, you can get it to be "only" 6x slower than pdist
:
In [39]: points = np.random.rand(1000, 2)
In [40]: %timeit pdist(points)
100 loops, best of 3: 5.81 ms per loop
In [41]: %%timeit
...: n = len(points)
...: rng = np.arange(1, n)
...: idx = np.arange(n *(n-1) // 2) + np.repeat(np.cumsum(rng), rng[::-1])
...: np.take((points[:, None] - points).reshape(-1, 2), idx, axis=0)
...:
10 loops, best of 3: 33.9 ms per loop
You can also speed up your solution, creating the indices yourself, and using take instead of fancy indexing:
In [75]: %%timeit
...: n = len(points)
...: rng = np.arange(1, n)
...: idx1 = np.repeat(rng - 1, rng[::-1])
...: idx2 = np.arange(n*(n-1)//2) + np.repeat(n - np.cumsum(rng[::-1]), rng[::-1])
...: np.take(points, idx1, axis=0) - np.take(points, idx2, axis=0)
...:
10 loops, best of 3: 38.8 ms per loop