Fastest Way to Find the Dot Product of a Large Matrix of Vectors

别说谁变了你拦得住时间么 提交于 2020-06-29 03:44:27

问题


I am looking for suggestions on the most efficient way to solve the following problem:

I have two arrays called A and B. They are both of shape NxNx3. They represent two 2D matrix of positions, where each position is a vector of x, y, and z coordinates.

I want to create a new array, called C, of shape NxN, where C[i, j] is the dot product of the vectors A[i, j] and B[i, j].

Here are the solutions I've come up with so far. The first uses the numpy's einsum function (which is beautifully described here). The second uses numpy's broadcasting rules along with its sum function.

>>> import numpy as np
>>> A = np.random.randint(0, 10, (100, 100, 3))
>>> B = np.random.randint(0, 10, (100, 100, 3))
>>> C = np.einsum("ijk,ijk->ij", A, B)
>>> D = np.sum(A * B, axis=2)
>>> np.allclose(C, D)
True

Is there a faster way? I've heard murmurs that numpy's tensordot function can be blazing fast but I've always struggled to understand it. What about using numpy's dot, or inner functions?

For some context, the A and B arrays will typically have between 100 and 1000 elements.

Any guidance is much appreciated!


回答1:


With a bit of reshaping, we can use matmul. The idea is to treat the first 2 dimensions as the 'batch' dimensions, and to the dot on the last:

In [278]: E = A[...,None,:]@B[...,:,None]                                       
In [279]: E.shape                                                               
Out[279]: (100, 100, 1, 1)
In [280]: E = np.squeeze(A[...,None,:]@B[...,:,None])                           
In [281]: np.allclose(C,E)                                                      
Out[281]: True
In [282]: timeit E = np.squeeze(A[...,None,:]@B[...,:,None])                    
130 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [283]: timeit C = np.einsum("ijk,ijk->ij", A, B)                             
90.2 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Comparative timings can be a bit tricky. In the current versions, einsum can take different routes depending on the dimensions. In some cases it appears to delegate the task to matmul (or at least the same underlying BLAS-like code). While it's nice that einsum is faster in this test, I wouldn't generalize that.

tensordot just reshapes (and if needed transposes) the arrays so it can apply the ordinary 2d np.dot. Actually it doesn't work here because you are treating the first 2 axes as a 'batch', where as it does an outer product on them.



来源:https://stackoverflow.com/questions/62603377/fastest-way-to-find-the-dot-product-of-a-large-matrix-of-vectors

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!