Numpy Performance - Outer Product of a vector with its transpose

前端 未结 1 1051
青春惊慌失措
青春惊慌失措 2021-01-15 17:22

It is my understanding that the Outer Product of a vector with its transpose is symmetric in value.

Does Numpy take this into account to only do the multiplications

相关标签:
1条回答
  • 2021-01-15 17:53

    Exploring some alternatives:

    In [162]: x=np.arange(100)
    In [163]: np.outer(x,x)
    Out[163]: 
    array([[   0,    0,    0, ...,    0,    0,    0],
           [   0,    1,    2, ...,   97,   98,   99],
           [   0,    2,    4, ...,  194,  196,  198],
           ...,
           [   0,   97,  194, ..., 9409, 9506, 9603],
           [   0,   98,  196, ..., 9506, 9604, 9702],
           [   0,   99,  198, ..., 9603, 9702, 9801]])
    In [164]: x1=x[:,None]
    In [165]: x1*x1.T
    Out[165]: 
    array([[   0,    0,    0, ...,    0,    0,    0],
           [   0,    1,    2, ...,   97,   98,   99],
           [   0,    2,    4, ...,  194,  196,  198],
           ...,
           [   0,   97,  194, ..., 9409, 9506, 9603],
           [   0,   98,  196, ..., 9506, 9604, 9702],
           [   0,   99,  198, ..., 9603, 9702, 9801]])
    In [166]: np.dot(x1,x1.T)
    Out[166]: 
    array([[   0,    0,    0, ...,    0,    0,    0],
           [   0,    1,    2, ...,   97,   98,   99],
           [   0,    2,    4, ...,  194,  196,  198],
           ...,
           [   0,   97,  194, ..., 9409, 9506, 9603],
           [   0,   98,  196, ..., 9506, 9604, 9702],
           [   0,   99,  198, ..., 9603, 9702, 9801]])
    

    Comparing their times:

    In [167]: timeit np.outer(x,x)
    40.8 µs ± 63.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [168]: timeit x1*x1.T
    36.3 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [169]: timeit np.dot(x1,x1.T)
    60.7 µs ± 6.86 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    Is dot using a transpose short cut? I don't think so, or if it does, it doesn't help in this case. I'm a little surprised that dot is slower.

    In [170]: x2=x1.T
    In [171]: timeit np.dot(x1,x2)
    61.1 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    Another method

    In [172]: timeit np.einsum('i,j',x,x)
    28.3 µs ± 19.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    einsum with x1 and x2 has the same times.

    Interesting that matmul does as well as einsum in this case (maybe einsum is delegating to matmul?)

    In [178]: timeit x1@x2
    27.3 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [179]: timeit x1@x1.T
    27.2 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    Numpy efficient matrix self-multiplication (gram matrix) demonstrates how dot can be save time by being clever (for a 1000x1000 array).

    As discussed in the links, dot can detect when one argument is the transpose of the other (probably by checking the data buffer pointer and shape and strides), and can use a BLAS function optimized for symmetric calculations. But I don't see evidence of outer doing that. And its unlikely that broadcasted multiplication would take such a step.

    0 讨论(0)
提交回复
热议问题