I am attempting to calculate the gradient norm with respect to the weights of a neural network with keras (as a diagnostic tool). Eventually, I want to create a callback for
Extending josteinb's comment, I'm sharing the version that I have used.
Basically same with the previous answer, but this version integrates norm computation into the usual training routine.
import keras.backend as K
# Get a "l2 norm of gradients" tensor
def get_gradient_norm(model):
with K.name_scope('gradient_norm'):
grads = K.gradients(model.total_loss, model.trainable_weights)
norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
return norm
# Build a model
model = Model(...)
# Compile the model
model.compile(
loss="categorical_crossentropy",
optimizer="adam",
metrics=["categorical_accuracy"],
)
# Append the "l2 norm of gradients" tensor as a metric
model.metrics_names.append("gradient_norm")
model.metrics_tensors.append(get_gradient_norm(model))
# You can compute the norm within the usual training routine
loss, acc, gradient_norm = model.train_on_batch(batch, label)
There are several placeholders related to the gradient computation process in Keras:
x
y
model.fit()
, Keras still generates a placeholder for sample weights, and feed np.ones((y.shape[0],), dtype=K.floatx())
into the graph during training.Dropout
).So, in your provided example, in order to compute the gradients, you need to feed x
, y
and sample_weights
into the graph. That's the underlying reason of the error.
Inside Model._make_train_function()
there are the following lines showing how to construct the necessary inputs to K.function()
in this case:
inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
inputs += [K.learning_phase()]
with K.name_scope('training'):
...
self.train_function = K.function(inputs,
[self.total_loss] + self.metrics_tensors,
updates=updates,
name='train_function',
**self._function_kwargs)
By mimicking this function, you should be able to get the norm value:
def get_gradient_norm_func(model):
grads = K.gradients(model.total_loss, model.trainable_weights)
summed_squares = [K.sum(K.square(g)) for g in grads]
norm = K.sqrt(sum(summed_squares))
inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
func = K.function(inputs, [norm])
return func
def main():
x = np.random.random((128,)).reshape((-1, 1))
y = 2 * x
model = Sequential(layers=[Dense(2, input_shape=(1,)),
Dense(1)])
model.compile(loss='mse', optimizer='rmsprop')
get_gradient = get_gradient_norm_func(model)
history = model.fit(x, y, epochs=1)
print(get_gradient([x, y, np.ones(len(y))]))
Execution output:
Epoch 1/1
128/128 [==============================] - 0s - loss: 2.0073
[4.4091368]
Note that since you're using Sequential
instead of Model
, model.model._feed_*
is required instead of model._feed_*
.