问题
I have a network made with InceptionNet, and for an input sample bx
, I want to compute the gradients of the model output w.r.t. the hidden layer. I have the following code:
bx = tf.reshape(x_batch[0, :, :, :], (1, 299, 299, 3))
with tf.GradientTape() as gtape:
#gtape.watch(x)
preds = model(bx)
print(preds.shape, end=' ')
class_idx = np.argmax(preds[0])
print(class_idx, end=' ')
class_output = model.output[:, class_idx]
print(class_output, end=' ')
last_conv_layer = model.get_layer('inception_v3').get_layer('mixed10')
#gtape.watch(last_conv_layer)
print(last_conv_layer)
grads = gtape.gradient(class_output, last_conv_layer.output)#[0]
print(grads)
But, this will give None
. I tried gtape.watch(bx)
as well, but it still gives None
.
Before trying GradientTape, I tried using tf.keras.backend.gradient
but that gave an error as follows:
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
My model is as follows:
model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
inception_v3 (Model) (None, 1000) 23851784
_________________________________________________________________
dense_5 (Dense) (None, 2) 2002
=================================================================
Total params: 23,853,786
Trainable params: 23,819,354
Non-trainable params: 34,432
_________________________________________________________________
Any solution is appreciated. It doesn't have to be GradientTape, if there is any other way to compute these gradients.
回答1:
You can use the tape to compute the gradient of an output node, wrt a set of watchable objects. By default, trainable variables are watchable by the tape, and you can access the trainable variables of a specific layer by getting it by name and accessing to the trainable_variables
property.
E.g. in the code below, I compute the gradients of the prediction, only with respect to the variables of the first FC layer (name "fc1") considering any other variable a constant.
import tensorflow as tf
model = tf.keras.models.Sequential(
[
tf.keras.layers.Dense(10, input_shape=(3,), name="fc1", activation="relu"),
tf.keras.layers.Dense(3, input_shape=(3,), name="fc2"),
]
)
inputs = tf.ones((1, 299, 299, 3))
with tf.GradientTape() as tape:
preds = model(inputs)
grads = tape.gradient(preds, model.get_layer("fc1").trainable_variables)
print(grads)
回答2:
I had the same problem as you. I'm not sure if this is the cleanest way to solve the problem, but here's my solution.
I think the problem is that you need to pass along the actual return value of last_conv_layer.call(...)
as an argument to tape.watch()
. Since all layers are called sequentially within the scope of the model(bx)
call, you'll have to somehow inject some code into this inner scope. I did this using the following decorator:
def watch_layer(layer, tape):
"""
Make an intermediate hidden `layer` watchable by the `tape`.
After calling this function, you can obtain the gradient with
respect to the output of the `layer` by calling:
grads = tape.gradient(..., layer.result)
"""
def decorator(func):
def wrapper(*args, **kwargs):
# Store the result of `layer.call` internally.
layer.result = func(*args, **kwargs)
# From this point onwards, watch this tensor.
tape.watch(layer.result)
# Return the result to continue with the forward pass.
return layer.result
return wrapper
layer.call = decorator(layer.call)
return layer
In your example, I believe the following should then work for you:
bx = tf.reshape(x_batch[0, :, :, :], (1, 299, 299, 3))
last_conv_layer = model.get_layer('inception_v3').get_layer('mixed10')
with tf.GradientTape() as gtape:
# Make the `last_conv_layer` watchable
watch_layer(last_conv_layer, gtape)
preds = model(bx)
class_idx = np.argmax(preds[0])
class_output = model.output[:, class_idx]
# Get the gradient w.r.t. the output of `last_conv_layer`
grads = gtape.gradient(class_output, last_conv_layer.result)
print(grads)
来源:https://stackoverflow.com/questions/56478454/in-tensorflow-2-0-with-eager-execution-how-to-compute-the-gradients-of-a-networ