I have a set of points in 2-dimensional space and need to calculate the distance from each point to each other point.
I have a relatively small number of points, ma
Jake Vanderplas gives this example using broadcasting in Python Data Science Handbook, which is very similar to what @shx2 proposed.
import numpy as np
rand = random.RandomState(42)
X = rand.rand(3, 2)
dist_sq = np.sum((X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2, axis = -1)
dist_sq
array([[0. , 0.18543317, 0.81602495],
[0.18543317, 0. , 0.22819282],
[0.81602495, 0.22819282, 0. ]])
Here is how you can do it using numpy:
import numpy as np
x = np.array([0,1,2])
y = np.array([2,4,6])
# take advantage of broadcasting, to make a 2dim array of diffs
dx = x[..., np.newaxis] - x[np.newaxis, ...]
dy = y[..., np.newaxis] - y[np.newaxis, ...]
dx
=> array([[ 0, -1, -2],
[ 1, 0, -1],
[ 2, 1, 0]])
# stack in one array, to speed up calculations
d = np.array([dx,dy])
d.shape
=> (2, 3, 3)
Now all is left is computing the L2-norm along the 0-axis (as discussed here):
(d**2).sum(axis=0)**0.5
=> array([[ 0. , 2.23606798, 4.47213595],
[ 2.23606798, 0. , 2.23606798],
[ 4.47213595, 2.23606798, 0. ]])
If you don't need the full distance matrix, you will be better off using kd-tree. Consider scipy.spatial.cKDTree
or sklearn.neighbors.KDTree
. This is because a kd-tree kan find k-nearnest neighbors in O(n log n) time, and therefore you avoid the O(n**2) complexity of computing all n by n distances.
You can take advantage of the complex
type :
# build a complex array of your cells
z = np.array([complex(c.m_x, c.m_y) for c in cells])
# mesh this array so that you will have all combinations
m, n = np.meshgrid(z, z)
# get the distance via the norm
out = abs(m-n)
Meshing is the main idea. But numpy
is clever, so you don't have to generate m
& n
. Just compute the difference using a transposed version of z
. The mesh is done automatically :
out = abs(z[..., np.newaxis] - z)
And if z
is directly set as a 2-dimensional array, you can use z.T
instead of the weird z[..., np.newaxis]
. So finally, your code will look like this :
z = np.array([[complex(c.m_x, c.m_y) for c in cells]]) # notice the [[ ... ]]
out = abs(z.T-z)
>>> z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])
>>> abs(z.T-z)
array([[ 0. , 2.23606798, 4.12310563],
[ 2.23606798, 0. , 4.24264069],
[ 4.12310563, 4.24264069, 0. ]])
As a complement, you may want to remove duplicates afterwards, taking the upper triangle :
>>> np.triu(out)
array([[ 0. , 2.23606798, 4.12310563],
[ 0. , 0. , 4.24264069],
[ 0. , 0. , 0. ]])
>>> timeit.timeit('abs(z.T-z)', setup='import numpy as np;z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])')
4.645645342274779
>>> timeit.timeit('abs(z[..., np.newaxis] - z)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
5.049334864854522
>>> timeit.timeit('m, n = np.meshgrid(z, z); abs(m-n)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
22.489568296184686