问题
I have a numpy 1D array of z values, and I want to calculate the difference between all combinations of the entries, with the output as a square matrix.
I know how to calculate this as a distance between all combinations of the points using cdist, but that does not give me the sign:
So for example if my z vector is [1,5,8]
import numpy as np
from scipy.spatial.distance import cdist
z=np.array([1, 5, 8])
z2=np.column_stack((z,np.zeros(3)))
cdist(z2,z2)
gives me:
array([[0., 4., 7.],
[4., 0., 3.],
[7., 3., 0.]])
but I want to have signs to give me:
array([[0., 4., 7.],
[-4., 0., 3.],
[-7., -3., 0.]])
I thought about fudging things by using np.tril_indices to flip the sign of the lower triangle, but this won't work, as I need the pairs to be differenced in a consistent way for my operation (i.e. if I perform this on two or more vectors, the pairs are always compared in the same order), whereas by flipping the sign I will always have positive differences in the upper right and negative in the lower left.
回答1:
In [29]: z = np.array([1, 5, 8])
In [30]: -np.subtract.outer(z, z)
Out[30]:
array([[ 0, 4, 7],
[-4, 0, 3],
[-7, -3, 0]])
(Drop the minus sign if you don't care about the sign convention.)
回答2:
Simple one line solution using numpy array broadcasting.
import numpy as np
z = np.array([1, 5, 8])
# Simple one line solution
z - z.reshape(-1,1)
Output:
array([[ 0, 4, 7],
[-4, 0, 3],
[-7, -3, 0]])
回答3:
I've worked out I can get the answer I want with a double iterator, although I'm not sure it is the most efficient for very large arrays
np.array([j-i for i in z for j in z]).reshape(len(z),len(z))
output:
array([[ 0, 4, 7],
[-4, 0, 3],
[-7, -3, 0]])
EDIT: so indeed the other two solutions are about 50 times faster:
python3 -m timeit -s "import numpy as np" -s "z=np.random.uniform(size=5000)" "z-z.reshape(-1,1)"
2 loops, best of 5: 119 msec per loop
python3 -m timeit -s "import numpy as np" -s "z=np.random.uniform(size=5000)" "np.subtract.outer(z, z)"
2 loops, best of 5: 118 msec per loop
python3 -m timeit -s "import numpy as np" -s "z=np.random.uniform(size=5000)" "np.array([j-i for i in z for j in z]).reshape(len(z),len(z))"
1 loop, best of 5: 5.18 sec per loop
来源:https://stackoverflow.com/questions/59147730/calculate-difference-between-all-combinations-of-entries-in-a-vector