问题
I am interested in computing the gradient of a loss that is calculated from a product of a matrix multiplication in TensorFlow with Eager Execution. I can do so if the product is computed as a tensor, but not if it's assign()
ed in place to a variable. Here is the greatly reduced code:
import tensorflow as tf
import numpy as np
tf.enable_eager_execution()
multipliers_net = tf.get_variable("multipliers", shape=(1, 3, 3, 1),
initializer=tf.random_normal_initializer())
activations_net = tf.Variable(tf.ones_like(multipliers_net))
output_indices = [(0, 1, 2, 0)]
def step():
global activations_net
#### PROBLEMATIC ####
activations_net.assign(multipliers_net * activations_net)
#### NO PROBLEM ####
# activations_net = multipliers_net * activations_net
return tf.gather_nd(activations_net, output_indices)
def train(targets):
for y in targets:
with tf.GradientTape() as tape:
out = step()
print("OUT", out)
loss = tf.reduce_mean(tf.square(y - out))
print("LOSS", loss)
de_dm = tape.gradient(loss, multipliers_net)
print("GRADIENT", de_dm, sep="\n")
multipliers_net.assign(LEARNING_RATE * de_dm)
targets = [[2], [3], [4], [5]]
train(targets)
As it stands, this code will show the correct OUT and LOSS values, but the GRADIENT will be printed as None. However, if the line below "PROBLEMATIC" is commented and the "NO PROBLEM" is uncommented, the gradient is computed just fine. I infer this is because in the second case, activations_net
becomes a Tensor, but I don't know why that suddenly makes the gradient computable, whereas before it was not.
I'm pretty sure that I should keep activations_net
and multiplier_net
as Variables, because in the larger scheme of things, both are updated dynamically and as I understand it, such things are best kept as Variables instead of constantly reassigning Tensors.
回答1:
I'll try to explain to the best of my knowledge. The problem occurs in a this line
de_dm = tape.gradient(loss, multipliers_net)
If you print(tape.watched_variables()
in both "PROBLEMATIC" and "NO PROBLEM" cases, you'll see that in first case tape 'watches' the same multipliers_net
variable twice.
You can try tape.reset()
and tape.watch()
, but it will have no effect as long as you pass assign op into it.
If you try multipliers_net.assign(any_variable)
inside tf.GradientTape()
, you'll find that it won't work. But if you try assigning something that produces tensor, e.g. tf.ones_like()
, it will work.
multipliers_net.assign(LEARNING_RATE * de_dm)
This works for same reason. It seems to accept only eager_tensors
Hope this helps
来源:https://stackoverflow.com/questions/55155847/tensorflow-cannot-get-gradient-wrt-a-variable-but-can-wrt-a-tensor