Calculating gradient norm wrt weights with keras

后端 未结 2 1632
礼貌的吻别
礼貌的吻别 2021-01-01 03:33

I am attempting to calculate the gradient norm with respect to the weights of a neural network with keras (as a diagnostic tool). Eventually, I want to create a callback for

相关标签:
2条回答
  • 2021-01-01 03:51

    Extending josteinb's comment, I'm sharing the version that I have used.

    Basically same with the previous answer, but this version integrates norm computation into the usual training routine.

    import keras.backend as K
    
    # Get a "l2 norm of gradients" tensor
    def get_gradient_norm(model):
        with K.name_scope('gradient_norm'):
            grads = K.gradients(model.total_loss, model.trainable_weights)
            norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
        return norm
    
    # Build a model
    model = Model(...)
    
    # Compile the model
    model.compile(
        loss="categorical_crossentropy",
        optimizer="adam",
        metrics=["categorical_accuracy"],
    )
    
    # Append the "l2 norm of gradients" tensor as a metric
    model.metrics_names.append("gradient_norm")
    model.metrics_tensors.append(get_gradient_norm(model))
    
    # You can compute the norm within the usual training routine
    loss, acc, gradient_norm = model.train_on_batch(batch, label)
    
    0 讨论(0)
  • 2021-01-01 03:53

    There are several placeholders related to the gradient computation process in Keras:

    1. Input x
    2. Target y
    3. Sample weights: even if you don't provide it in model.fit(), Keras still generates a placeholder for sample weights, and feed np.ones((y.shape[0],), dtype=K.floatx()) into the graph during training.
    4. Learning phase: this placeholder will be connected to the gradient tensor only if there's any layer using it (e.g. Dropout).

    So, in your provided example, in order to compute the gradients, you need to feed x, y and sample_weights into the graph. That's the underlying reason of the error.

    Inside Model._make_train_function() there are the following lines showing how to construct the necessary inputs to K.function() in this case:

    inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
    if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
        inputs += [K.learning_phase()]
    
    with K.name_scope('training'):
        ...
        self.train_function = K.function(inputs,
                                         [self.total_loss] + self.metrics_tensors,
                                         updates=updates,
                                         name='train_function',
                                         **self._function_kwargs)
    

    By mimicking this function, you should be able to get the norm value:

    def get_gradient_norm_func(model):
        grads = K.gradients(model.total_loss, model.trainable_weights)
        summed_squares = [K.sum(K.square(g)) for g in grads]
        norm = K.sqrt(sum(summed_squares))
        inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
        func = K.function(inputs, [norm])
        return func
    
    def main():
        x = np.random.random((128,)).reshape((-1, 1))
        y = 2 * x
        model = Sequential(layers=[Dense(2, input_shape=(1,)),
                                   Dense(1)])
        model.compile(loss='mse', optimizer='rmsprop')
        get_gradient = get_gradient_norm_func(model)
        history = model.fit(x, y, epochs=1)
        print(get_gradient([x, y, np.ones(len(y))]))
    

    Execution output:

    Epoch 1/1
    128/128 [==============================] - 0s - loss: 2.0073     
    [4.4091368]
    

    Note that since you're using Sequential instead of Model, model.model._feed_* is required instead of model._feed_*.

    0 讨论(0)
提交回复
热议问题