问题
I have a 4D Numpy array of shape (15, 2, 320, 320). In other words, each element of the [320 x 320] matrix is a matrix of size [15 x 2]. Now, I would like to compute the dot product for each element of the [320x320] matrix, then extract the diagonal array.
Currently, I am using 2 "for" loops, and the code works well (see the snippet). However, the calculation speed is too slow (when I process a large data). Anyone can show me how to vectorize the computation without the loops.
A = np.random.rand(15, 2, 320, 320)
B = np.zeros((2, 320, 320)) # store the final results
for row in range (320):
for col in range (320):
C = np.dot(A[:, :, row, col].T, A[:, :, row, col]) # the shape of C is [2 x 2]
C = np.diag(C)
B[:, row, col] = C
Any help or suggestions would be greatly appreciated!
回答1:
It's kinda funny sometimes how tensor algebra works out. It's just (A*A).sum(0)
:
A = np.random.rand(15, 2, 320, 320)
B1 = np.zeros((2, 320, 320)) # store the final results
for row in range (320):
for col in range (320):
C = np.dot(A[:, :, row, col].T, A[:, :, row, col]) # the shape of C is [2 x 2]
C = np.diag(C)
B1[:, row, col] = C
B2 = (A*A).sum(0)
np.allclose(B1, B2)
True
(Credit to @Ehsan and @hpaulj who came up with the einsum
answer that suggests this formulation.)
回答2:
EDIT: suggested by @hpaulj in comments, you can use a shorter version of the code mentioned below as this. This code multiplies ij
of A
elementwise and then sums it over i
(a.k.a. rows) and returns it (which is equivalent to getting diagonal of dot product):
np.einsum('ijkl,ijkl->jkl',A,A)
You can vectorize it using one-liner as below. Of course, you can use combination of transpose and dot product to substitute the np.einsum
, but I think indices in np.einsum
makes code cleaner and more readable. np.diagonal
takes the diagonal of first two axis and attaches as last dimension, hence, you need the transpose at the end to bring that dimension to the left where you want it:
np.diagonal(np.einsum('ijkl,imkl->jmkl',A,A)).transpose(2,0,1)
test the output matches your code's output B
:
np.allclose(np.diagonal(np.einsum('ijkl,imkl->jmkl',A,A)).transpose(2,0,1), B)
#True
回答3:
This should work:
A_T_1 = np.transpose(A, (2,3,0,1)) # Shape (320,320,2,15)
A_T_2 = np.transpose(A, (2,3,1,0)) # Shape (320,320,15,2)
C_all = A_T_2 @ A_T_1 # Shape (320,320,2,2)
D_all = np.diagonal(C_all, axis1=2, axis2=3) # Shape (320,320,2)
B = np.transpose(D_all, (2,0,1)) # Shape (2,320,320)
Note:
transpose()
only returns a view of its input array, and doesn't do any expensive data copy.- Here you don't need to initialize
B = np.zeros((2, 320, 320))
来源:https://stackoverflow.com/questions/65696531/dot-product-two-4d-numpy-array