问题

I want to multiply an n-dim stack of m* m matrices by an n-dim stack of vectors (length m), so that the resulting m*n array contains the result of the dot product of the matrix and vector in the nth entry:

vec1=np.array([0,0.5,1,0.5]); vec2=np.array([2,0.5,1,0.5])
vec=np.transpose(n.stack((vec1,vec2)))
mat = np.moveaxis(n.array([[[0,1,2,3],[0,1,2,3],[0,1,2,3],[0,1,2,3]],[[-1,2.,0,1.],[0,0,-1,2.],[0,1,-1,2.],[1,0.1,1,1]]]),0,2)
outvec=np.zeros((4,2))
for i in range(2):
    outvec[:,i]=np.dot(mat[:,:,i],vec[:,i])

Inspired by this post Element wise dot product of matrices and vectors, I have tried all different perturbations of index combinations in einsum, and have found that

np.einsum('ijk,jk->ik',mat,vec)

gives the correct result.

Unfortunately I really do not understand this - I assumed the fact that I repeat the entry k in the 'ijk,jk' part means that I multiply AND sum over k. I've tried to read the documentation https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.einsum.html, but I still don't understand.

(My previous attempts included,

 np.einsum('ijk,il->ik', mat, vec)

I'm not even sure what this means. What happens to the index l when I drop it?)

Thanks in advance!

回答1:

In [321]: vec1=np.array([0,0.5,1,0.5]); vec2=np.array([2,0.5,1,0.5])
     ...: vec=np.transpose(np.stack((vec1,vec2)))
In [322]: vec1.shape
Out[322]: (4,)
In [323]: vec.shape
Out[323]: (4, 2)

A nice thing about the stack function is we can specify an axis, skipping the transpose:

In [324]: np.stack((vec1,vec2), axis=1).shape
Out[324]: (4, 2)

Why the mix of np. and n.? NameError: name 'n' is not defined. That kind of thing almost sends me away.

In [326]: mat = np.moveaxis(np.array([[[0,1,2,3],[0,1,2,3],[0,1,2,3],[0,1,2,3]],[[-1,2.,0
     ...: ,1.],[0,0,-1,2.],[0,1,-1,2.],[1,0.1,1,1]]]),0,2)
In [327]: mat.shape
Out[327]: (4, 4, 2)

In [328]: outvec=np.zeros((4,2))
     ...: for i in range(2):
     ...:     outvec[:,i]=np.dot(mat[:,:,i],vec[:,i])
     ...:     
In [329]: outvec
Out[329]: 
array([[ 4.  , -0.5 ],
       [ 4.  ,  0.  ],
       [ 4.  ,  0.5 ],
       [ 4.  ,  3.55]])

In [330]: # (4,4,2) (4,2)   'kji,ji->ki'

From your loop, the location of the i axis (size 2) is clear - last in all 3 arrays. That leaves one axis for vec, lets call that j. It pairs with the last (next to i of mat). k carries over from mat to outvec.

In [331]: np.einsum('kji,ji->ki', mat, vec)
Out[331]: 
array([[ 4.  , -0.5 ],
       [ 4.  ,  0.  ],
       [ 4.  ,  0.5 ],
       [ 4.  ,  3.55]])

Often the einsum string writes itself. For example if mat was described as (m,n,k) and vec as (n,k), with the result being (m,k)

In this case only the j dimension is summed - it appears on the left, but on the right. The last dimension, i in my notation, is not summed because if appears on both sides, just as it does in your iteration. I think of that as 'going-along-for-the-ride'. It isn't actively part of the dot product.

You are, in effect, stacking on the last dimension, size 2 one. Usually we stack on the first, but you transpose both to put that last.

Your 'failed' attempt runs, and can be reproduced as:

In [332]: np.einsum('ijk,il->ik', mat, vec)
Out[332]: 
array([[12. ,  4. ],
       [ 6. ,  1. ],
       [12. ,  4. ],
       [ 6. ,  3.1]])
In [333]: mat.sum(axis=1)*vec.sum(axis=1)[:,None]
Out[333]: 
array([[12. ,  4. ],
       [ 6. ,  1. ],
       [12. ,  4. ],
       [ 6. ,  3.1]])

The j and l dimensions don't appear on the right, so they are summed. They can be summed before multiplying because they appear in only one term each. I added the None to enable broadcasting (multiplying a ik with i).

np.einsum('ik,i->ik', mat.sum(axis=1), vec.sum(axis=1))

If you'd stacked on the first, and added a dimension for vec (2,4,1), it would matmul with a (2,4,4) mat. mat @ vec[...,None].

In [337]: m1 = mat.transpose(2,0,1)
In [338]: m1@v1[...,None]
Out[338]: 
array([[[ 4.  ],
        [ 4.  ],
        [ 4.  ],
        [ 4.  ]],

       [[-0.5 ],
        [ 0.  ],
        [ 0.5 ],
        [ 3.55]]])
In [339]: _.shape
Out[339]: (2, 4, 1)

回答2:

Read up on Einstein summation notation.

Basically, the rules are:

Without a `->`

Any letter repeated in the inputs represents an axis to be multipled and summed over
Any letter not repeated in the inputs is included in the output

With a `->`

Any letter repeated in the inputs represents an axis to be multipled over
Any letter not in the output represents an axis to be summed over

So, for example, with matrices A and B wih same shape:

np.einsum('ij, ij',       A, B)  # is A ddot B,                returns 0d scalar
np.einsum('ij, jk',       A, B)  # is A dot  B,                returns 2d tensor
np.einsum('ij, kl',       A, B)  # is outer(A, B),             returns 4d tensor
np.einsum('ji, jk, kl',   A, B)  # is A.T @ B @ A,             returns 2d tensor
np.einsum('ij, ij -> ij', A, B)  # is A * B,                   returns 2d tensor
np.einsum('ij, ij -> i' , A, A)  # is norm(A, axis = 1),       returns 1d tensor
np.einsum('ii'             , A)  # is tr(A),                   returns 0d scalar

回答3:

einsum is easy (when you had played with permutation of indices for a while, that is...).

Let's work with something simple, a triple stack of 2×2 matrices and a triple stack of 2×, arrays

import numpy as np

a = np.arange(3*2*2).reshape((3,2,2))
b = np.arange(3*2).reshape((3,2))

We need to know what we are going to compute using einsum

In [101]: for i in range(3): 
     ...:     print(a[i]@b[i])                                                                            
[1 3]
[23 33]
[77 95]

What we have done? we have an index i that is fixed when we perform a dot product between one of the stacked matrices and one of the stacked vectors (both indexed by i) and the individual output line implies a summation over the last index of the stacked matrix and the lone index of the stacked vector.

This is easily encoded in an einsum directive

we want the same i index to specify the matrix, the vector and also the output,
we want to reduce along the last matrix index and the remaining vector index, say k
we want to have as many columns in the output as the rows in each stacked matrix, say j

Hence

In [102]: np.einsum('ijk,ik->ij', a, b)                                                                   
Out[102]: 
array([[ 1,  3],
       [23, 33],
       [77, 95]])

I hope that my discussion of how I got the directive right is clear, correct and useful.

来源：https://stackoverflow.com/questions/52731784/numpy-einsum-notation-dot-product-of-a-stack-of-matrices-with-stack-of-vector

标签

python

numpy

numpy-broadcasting

numpy - einsum notation: dot product of a stack of matrices with stack of vectors

问题

回答1:

回答2:

Without a ->

With a ->

回答3:

Without a `->`

With a `->`