I have some data represented by input_x
. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n
. input
It seems that in TensorFlow 1.11.0 the docs for tf.matmul
incorrectly say that it works for rank >= 2.
Instead, the best clean alternative I've found is to use tf.tensordot(a, b, (-1, 0))
(docs).
This function gets the dot product of any axis of array a
and any axis of array b
in its general form tf.tensordot(a, b, axis)
. Providing axis
as (-1, 0)
gets the standard dot product of two arrays.
The matmul operation only works on matrices (2D tensors). Here are two main approaches to do this, both assume that U
is a 2D tensor.
Slice embed
into 2D tensors and multiply each of them with U
individually. This is probably easiest to do using tf.scan() like this:
h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
On the other hand if efficiency is important it may be better to reshape embed
to be a 2D tensor so the multiplication can be done with a single matmul
like this:
embed = tf.reshape(embed, [-1, m])
h = tf.matmul(embed, U)
h = tf.reshape(h, [-1, n, c])
where c
is the number of columns in U
. The last reshape will make sure that h
is a 3D tensor where the 0th dimension corresponds to the batch just like the original x_input
and embed
.
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((batch_size, m, p))
# python >= 3.5
MN = M @ N
# or the old way,
MN = tf.matmul(M, N)
# MN has shape (batch_size, n, p)
We fall back to case 1 by adding and removing a dimension to v
.
M = tf.random_normal((batch_size, n, m))
v = tf.random_normal((batch_size, m))
Mv = (M @ v[..., None])[..., 0]
# Mv has shape (batch_size, n)
In this case, we cannot simply add a batch dimension of 1
to the single matrix, because tf.matmul
does not broadcast in the batch dimension.
In that case, we can treat the matrix batch as a single large matrix, using a simple reshape.
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((m, p))
MN = tf.reshape(tf.reshape(M, [-1, m]) @ N, [-1, n, p])
# MN has shape (batch_size, n, p)
This case is more complicated. We can fall back to case 3.1 by transposing the matrices.
MT = tf.matrix_transpose(M)
NT = tf.matrix_transpose(N)
NTMT = tf.reshape(tf.reshape(NT, [-1, m]) @ MT, [-1, p, n])
MN = tf.matrix_transpose(NTMT)
However, transposition can be a costly operation, and here it is done twice on an entire batch of matrices. It may be better to simply duplicate M
to match the batch dimension:
MN = tf.tile(M[None], [batch_size, 1, 1]) @ N
Profiling will tell which option works better for a given problem/hardware combination.
This looks similar to case 3.2 since the single matrix is on the left, but it is actually simpler because transposing a vector is essentially a no-op. We end-up with
M = tf.random_normal((n, m))
v = tf.random_normal((batch_size, m))
MT = tf.matrix_transpose(M)
Mv = v @ MT
einsum
?All of the previous multiplications could have been written with the tf.einsum swiss army knife. For example the first solution for 3.2 could be written simply as
MN = tf.einsum('nm,bmp->bnp', M, N)
However, note that einsum
is ultimately relying on tranpose and matmul for the computation.
So even though einsum
is a very convenient way to write matrix multiplications, it hides the complexity of the operations underneath — for example it is not straightforward to guess how many times an einsum
expression will transpose your data, and therefore how costly the operation will be. Also, it may hide the fact that there could be several alternatives for the same operation (see case 3.2) and might not necessarily choose the better option.
For this reason, I would personally use explicit formulas like those above to better convey their respective complexity. Although if you know what you are doing and like the simplicity of the einsum
syntax, then by all means go for it.
Previous answers are obsolete. Currently tf.matmul() support tensors with rank > 2:
The inputs must be matrices (or tensors of rank > 2, representing batches of matrices), with matching inner dimensions, possibly after transposition.
Also tf.batch_matmul()
was removed and tf.matmul()
is the right way to do batch multiplication. The main idea can be understood from the following code:
import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k)
. Here is what is going on here. Assume you have batch_size
of matrices nxm
and batch_size
of matrices mxk
. Now for each pair of them you calculate nxm X mxk
which gives you an nxk
matrix. You will have batch_size
of them.
Notice that something like this is also valid:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)
and will give you a shape (a, b, n, k)
As answered by @Stryke, there are two ways to achieve this: 1. Scanning, and 2. Reshaping
tf.scan requires lambda functions and is generally used for recursive operations. Some examples for the same are here: https://rdipietro.github.io/tensorflow-scan-examples/
I personally prefer reshaping, since it is more intuitive. If you are trying to matrix multiply each matrix in the 3D tensor by the matrix that is the 2D tensor, like Cijl = Aijk * Bkl, you can do it with a simple reshape.
A' = tf.reshape(Aijk,[i*j,k])
C' = tf.matmul(A',Bkl)
C = tf.reshape(C',[i,j,l])