I have two 2-d numpy arrays with the same dimensions, A and B, and am trying to calculate the row-wise dot product of them. I could do:
np.sum(A * B, axis=1)
Even though it is significantly slower for even moderate data sizes, I would use
np.diag(A.dot(B.T))
while you are developing the library and worry about optimizing it later when it will run in a production setting, or after the unit tests are written.
To most people who come upon your code, this will be more understandable than einsum
, and also doesn't require you to break some best practices by embedding your calculation within a mini DSL string to serve as the argument to some function call.
I agree that computing the off-diagonal elements is worth avoiding for large cases. It would have to be really really large for me to care about that though, and the trade-off for paying the awful price of expressing the calculating in an embedded string in einsum
is pretty severe.
This is a good application for numpy.einsum.
a = np.random.randint(0, 5, size=(6, 4))
b = np.random.randint(0, 5, size=(6, 4))
res1 = np.einsum('ij, ij->i', a, b)
res2 = np.sum(a*b, axis=1)
print(res1)
# [18 6 20 9 16 24]
print(np.allclose(res1, res2))
# True
einsum
also tends to be a bit faster.
a = np.random.normal(size=(5000, 1000))
b = np.random.normal(size=(5000, 1000))
%timeit np.einsum('ij, ij->i', a, b)
# 100 loops, best of 3: 8.4 ms per loop
%timeit np.sum(a*b, axis=1)
# 10 loops, best of 3: 28.4 ms per loop
Even faster is inner1d
from numpy.core.umath_tests
:
Code to reproduce the plot:
import numpy
from numpy.core.umath_tests import inner1d
import perfplot
perfplot.show(
setup=lambda n: (numpy.random.rand(n, 3), numpy.random.rand(n, 3)),
kernels=[
lambda a: numpy.sum(a[0]*a[1], axis=1),
lambda a: numpy.einsum('ij, ij->i', a[0], a[1]),
lambda a: inner1d(a[0], a[1])
],
labels=['sum', 'einsum', 'inner1d'],
n_range=[2**k for k in range(20)],
xlabel='len(a), len(b)',
logx=True,
logy=True
)