Numpy Performance - Outer Product of a vector with its transpose

前端未结

关注

 1  1051

青春惊慌失措

It is my understanding that the Outer Product of a vector with its transpose is symmetric in value.

Does Numpy take this into account to only do the multiplications

相关标签:

1条回答

感动是毒

2021-01-15 17:53

Exploring some alternatives:

In [162]: x=np.arange(100)
In [163]: np.outer(x,x)
Out[163]: 
array([[   0,    0,    0, ...,    0,    0,    0],
       [   0,    1,    2, ...,   97,   98,   99],
       [   0,    2,    4, ...,  194,  196,  198],
       ...,
       [   0,   97,  194, ..., 9409, 9506, 9603],
       [   0,   98,  196, ..., 9506, 9604, 9702],
       [   0,   99,  198, ..., 9603, 9702, 9801]])
In [164]: x1=x[:,None]
In [165]: x1*x1.T
Out[165]: 
array([[   0,    0,    0, ...,    0,    0,    0],
       [   0,    1,    2, ...,   97,   98,   99],
       [   0,    2,    4, ...,  194,  196,  198],
       ...,
       [   0,   97,  194, ..., 9409, 9506, 9603],
       [   0,   98,  196, ..., 9506, 9604, 9702],
       [   0,   99,  198, ..., 9603, 9702, 9801]])
In [166]: np.dot(x1,x1.T)
Out[166]: 
array([[   0,    0,    0, ...,    0,    0,    0],
       [   0,    1,    2, ...,   97,   98,   99],
       [   0,    2,    4, ...,  194,  196,  198],
       ...,
       [   0,   97,  194, ..., 9409, 9506, 9603],
       [   0,   98,  196, ..., 9506, 9604, 9702],
       [   0,   99,  198, ..., 9603, 9702, 9801]])

Comparing their times:

In [167]: timeit np.outer(x,x)
40.8 µs ± 63.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [168]: timeit x1*x1.T
36.3 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [169]: timeit np.dot(x1,x1.T)
60.7 µs ± 6.86 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Is dot using a transpose short cut? I don't think so, or if it does, it doesn't help in this case. I'm a little surprised that dot is slower.

In [170]: x2=x1.T
In [171]: timeit np.dot(x1,x2)
61.1 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Another method

In [172]: timeit np.einsum('i,j',x,x)
28.3 µs ± 19.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

einsum with x1 and x2 has the same times.

Interesting that matmul does as well as einsum in this case (maybe einsum is delegating to matmul?)

In [178]: timeit x1@x2
27.3 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [179]: timeit x1@x1.T
27.2 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Numpy efficient matrix self-multiplication (gram matrix) demonstrates how dot can be save time by being clever (for a 1000x1000 array).

As discussed in the links, dot can detect when one argument is the transpose of the other (probably by checking the data buffer pointer and shape and strides), and can use a BLAS function optimized for symmetric calculations. But I don't see evidence of outer doing that. And its unlikely that broadcasted multiplication would take such a step.

0 讨论(0)