calculate difference between all combinations of entries in a vector

问题

I have a numpy 1D array of z values, and I want to calculate the difference between all combinations of the entries, with the output as a square matrix.

I know how to calculate this as a distance between all combinations of the points using cdist, but that does not give me the sign:

So for example if my z vector is [1,5,8]

import numpy as np
from scipy.spatial.distance import cdist

z=np.array([1, 5, 8])
z2=np.column_stack((z,np.zeros(3)))
cdist(z2,z2)

gives me:

array([[0., 4., 7.],
       [4., 0., 3.],
       [7., 3., 0.]])

but I want to have signs to give me:

array([[0., 4., 7.],
       [-4., 0., 3.],
       [-7., -3., 0.]])

I thought about fudging things by using np.tril_indices to flip the sign of the lower triangle, but this won't work, as I need the pairs to be differenced in a consistent way for my operation (i.e. if I perform this on two or more vectors, the pairs are always compared in the same order), whereas by flipping the sign I will always have positive differences in the upper right and negative in the lower left.

回答1:

In [29]: z = np.array([1, 5, 8])                                                                                                                                                                     

In [30]: -np.subtract.outer(z, z)                                                                                                                                                                    
Out[30]: 
array([[ 0,  4,  7],
       [-4,  0,  3],
       [-7, -3,  0]])

(Drop the minus sign if you don't care about the sign convention.)

回答2:

Simple one line solution using numpy array broadcasting.

import numpy as np

z = np.array([1, 5, 8])
# Simple one line solution
z - z.reshape(-1,1)

Output:

array([[ 0,  4,  7],
       [-4,  0,  3],
       [-7, -3,  0]])

回答3:

I've worked out I can get the answer I want with a double iterator, although I'm not sure it is the most efficient for very large arrays

np.array([j-i for i in z for j in z]).reshape(len(z),len(z))

output:

array([[ 0,  4,  7],
   [-4,  0,  3],
   [-7, -3,  0]])

EDIT: so indeed the other two solutions are about 50 times faster:

python3 -m timeit -s "import numpy as np" -s "z=np.random.uniform(size=5000)" "z-z.reshape(-1,1)"
2 loops, best of 5: 119 msec per loop

python3 -m timeit -s "import numpy as np" -s "z=np.random.uniform(size=5000)" "np.subtract.outer(z, z)"
2 loops, best of 5: 118 msec per loop

python3 -m timeit -s "import numpy as np" -s "z=np.random.uniform(size=5000)" "np.array([j-i for i in z for j in z]).reshape(len(z),len(z))"
1 loop, best of 5: 5.18 sec per loop

来源：https://stackoverflow.com/questions/59147730/calculate-difference-between-all-combinations-of-entries-in-a-vector

标签

python

algorithm

scipy

distance

array-broadcasting