问题
I am interested in computing the derivative of a matrix determinant using TensorFlow. I can see from experimentation that TensorFlow has not implemented a method of differentiating through a determinant:
LookupError: No gradient defined for operation 'MatrixDeterminant'
(op type: MatrixDeterminant)
A little further investigation revealed that it is actually possible to compute the derivative; see for example Jacobi's formula. I determined that in order to implement this means of differentiating through a determinant that I need to use the function decorator,
@tf.RegisterGradient("MatrixDeterminant")
def _sub_grad(op, grad):
...
However, I am not familiar enough with tensor flow to understand how this can be accomplished. Does anyone have any insight on this matter?
Here's an example where I run into this issue:
x = tf.Variable(tf.ones(shape=[1]))
y = tf.Variable(tf.ones(shape=[1]))
A = tf.reshape(
tf.pack([tf.sin(x), tf.zeros([1, ]), tf.zeros([1, ]), tf.cos(y)]), (2,2)
)
loss = tf.square(tf.matrix_determinant(A))
optimizer = tf.train.GradientDescentOptimizer(0.001)
train = optimizer.minimize(loss)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in xrange(100):
sess.run(train)
print sess.run(x)
回答1:
Please check "Implement Gradient in Python" section here
In particular, you can implement it as follows
@ops.RegisterGradient("MatrixDeterminant")
def _MatrixDeterminantGrad(op, grad):
"""Gradient for MatrixDeterminant. Use formula from 2.2.4 from
An extended collection of matrix derivative results for forward and reverse
mode algorithmic differentiation by Mike Giles
-- http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf
"""
A = op.inputs[0]
C = op.outputs[0]
Ainv = tf.matrix_inverse(A)
return grad*C*tf.transpose(Ainv)
Then a simple training loop to check that it works:
a0 = np.array([[1,2],[3,4]]).astype(np.float32)
a = tf.Variable(a0)
b = tf.square(tf.matrix_determinant(a))
init_op = tf.initialize_all_variables()
sess = tf.InteractiveSession()
init_op.run()
minimization_steps = 50
learning_rate = 0.001
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(b)
losses = []
for i in range(minimization_steps):
train_op.run()
losses.append(b.eval())
Then you can visualize your loss over time
import matplotlib.pyplot as plt
plt.ylabel("Determinant Squared")
plt.xlabel("Iterations")
plt.plot(losses)
Should see something like this
回答2:
I think you are confused with what is a derivative of a matrix determinant.
Matrix determinant is a function which is calculated over the elements of the matrix by some formula. So if all the elements of the matrix are numbers, you the determinant will you you just one number and the derivative will be 0
. When some of the elements are variables, you will get an expression of these variables. For example:
x, x^2
1, sin(x)
The determinant will be x*sin(x) - x^2
and the derivative is 2x + sin(x) + x*cos(x)
. The Jacobi formula just connects the determinant with adjunct matrix.
In your example your matrix A
consists of only numbers and therefore the determinant is just a number and the loss
is just a number as well. GradientDescentOptimizer
needs to have some free variables to minimize and does not have any because your loss
is just a number.
回答3:
For those who are interested, I discovered the solution that works on my problems:
@tf.RegisterGradient("MatrixDeterminant")
def _MatrixDeterminant(op, grad):
"""Gradient for MatrixDeterminant."""
return op.outputs[0] * tf.transpose(tf.matrix_inverse(op.inputs[0]))
来源:https://stackoverflow.com/questions/33714832/matrix-determinant-differentiation-in-tensorflow