Regularization for LSTM in tensorflow

前端 未结 3 1725
庸人自扰
庸人自扰 2021-02-05 15:10

Tensorflow offers a nice LSTM wrapper.

rnn_cell.BasicLSTM(num_units, forget_bias=1.0, input_size=None,
           state_is_tuple=False, activation=tanh)
<         


        
相关标签:
3条回答
  • 2021-02-05 15:26

    tf.trainable_variables gives you a list of Variable objects that you can use to add the L2 regularization term. Note that this add regularization for all variables in your model. If you want to restrict the L2 term only to a subset of the weights, you can use the name_scope to name your variables with specific prefixes, and later use that to filter the variables from the list returned by tf.trainable_variables.

    0 讨论(0)
  • 2021-02-05 15:33

    I like to do the following, yet the only thing I know is that some parameters prefers not to be regularized with L2, such as batch norm parameters and biases. LSTMs contains one Bias tensor (despite conceptually it has many biases, they seem to be concatenated or something, for performance), and for the batch normalization I add "noreg" in the variables' name to ignore it too.

    loss = your regular output loss
    l2 = lambda_l2_reg * sum(
        tf.nn.l2_loss(tf_var)
            for tf_var in tf.trainable_variables()
            if not ("noreg" in tf_var.name or "Bias" in tf_var.name)
    )
    loss += l2
    

    Where lambda_l2_reg is the small multiplier, e.g.: float(0.005)

    Doing this selection (which is the full if in the loop discarding some variables in the regularization) once made me jump from 0.879 F1 score to 0.890 in one shot of testing the code without readjusting the value of the config's lambda, well this was including both the changes for the batch normalisation and the Biases and I had other biases in the neural network.

    According to this paper, regularizing the recurrent weights may help with exploding gradients.

    Also, according to this other paper, dropout would be better used between stacked cells and not inside cells if you use some.

    About the exploding gradient problem, if you use gradient clipping with the loss that has the L2 regularization already added to it, that regularization will be taken into account too during the clipping process.


    P.S. Here is the neural network I was working on: https://github.com/guillaume-chevalier/HAR-stacked-residual-bidir-LSTMs

    0 讨论(0)
  • 2021-02-05 15:37

    Tensorflow has some built-in and helper functions that let you apply L2 norms to your model such as tf.clip_by_global_norm:

        # ^^^ define your LSTM above here ^^^
    
        params = tf.trainable_variables()
    
        gradients = tf.gradients(self.losses, params)
    
        clipped_gradients, norm = tf.clip_by_global_norm(gradients,max_gradient_norm)
        self.gradient_norms = norm
    
        opt = tf.train.GradientDescentOptimizer(self.learning_rate)
        self.updates = opt.apply_gradients(
                        zip(clipped_gradients, params), global_step=self.global_step)
    

    in your training step run:

        outputs = session.run([self.updates, self.gradient_norms, self.losses], input_feed)
    
    0 讨论(0)
提交回复
热议问题