Calculating gradient norm wrt weights with keras

后端未结

关注

 2  1634

I am attempting to calculate the gradient norm with respect to the weights of a neural network with keras (as a diagnostic tool). Eventually, I want to create a callback for

相关标签:

2条回答

既然无缘

2021-01-01 03:51

Extending josteinb's comment, I'm sharing the version that I have used.

Basically same with the previous answer, but this version integrates norm computation into the usual training routine.

import keras.backend as K

# Get a "l2 norm of gradients" tensor
def get_gradient_norm(model):
    with K.name_scope('gradient_norm'):
        grads = K.gradients(model.total_loss, model.trainable_weights)
        norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
    return norm

# Build a model
model = Model(...)

# Compile the model
model.compile(
    loss="categorical_crossentropy",
    optimizer="adam",
    metrics=["categorical_accuracy"],
)

# Append the "l2 norm of gradients" tensor as a metric
model.metrics_names.append("gradient_norm")
model.metrics_tensors.append(get_gradient_norm(model))

# You can compute the norm within the usual training routine
loss, acc, gradient_norm = model.train_on_batch(batch, label)

0 讨论(0)

心在旅途

2021-01-01 03:53

There are several placeholders related to the gradient computation process in Keras:

Input x
Target y
Sample weights: even if you don't provide it in model.fit(), Keras still generates a placeholder for sample weights, and feed np.ones((y.shape[0],), dtype=K.floatx()) into the graph during training.
Learning phase: this placeholder will be connected to the gradient tensor only if there's any layer using it (e.g. Dropout).

So, in your provided example, in order to compute the gradients, you need to feed x, y and sample_weights into the graph. That's the underlying reason of the error.

Inside Model._make_train_function() there are the following lines showing how to construct the necessary inputs to K.function() in this case:

inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
    inputs += [K.learning_phase()]

with K.name_scope('training'):
    ...
    self.train_function = K.function(inputs,
                                     [self.total_loss] + self.metrics_tensors,
                                     updates=updates,
                                     name='train_function',
                                     **self._function_kwargs)

By mimicking this function, you should be able to get the norm value:

def get_gradient_norm_func(model):
    grads = K.gradients(model.total_loss, model.trainable_weights)
    summed_squares = [K.sum(K.square(g)) for g in grads]
    norm = K.sqrt(sum(summed_squares))
    inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
    func = K.function(inputs, [norm])
    return func

def main():
    x = np.random.random((128,)).reshape((-1, 1))
    y = 2 * x
    model = Sequential(layers=[Dense(2, input_shape=(1,)),
                               Dense(1)])
    model.compile(loss='mse', optimizer='rmsprop')
    get_gradient = get_gradient_norm_func(model)
    history = model.fit(x, y, epochs=1)
    print(get_gradient([x, y, np.ones(len(y))]))

Execution output:

Epoch 1/1
128/128 [==============================] - 0s - loss: 2.0073     
[4.4091368]

Note that since you're using Sequential instead of Model, model.model._feed_* is required instead of model._feed_*.

0 讨论(0)