Loss clipping in tensor flow (on DeepMind's DQN)

后端 未结 4 524
渐次进展
渐次进展 2021-01-02 06:39

I am trying my own implementation of the DQN paper by Deepmind in tensor flow and am running into difficulty with clipping of the loss function.

Here is an excerpt

相关标签:
4条回答
  • 2021-01-02 07:10
    1. No. They talk about error clipping actually, not about loss clipping which is however as far as I know referring to the same thing but leads to confusion. They DO NOT mean that the loss below -1 is clipped to -1 and the loss above +1 is clipped to +1 because that leads to zero gradients outside the error range [-1;1] as you realized. Instead, they suggest to use a linear loss instead of a quadratic loss for error values < -1 and error values > 1.

    2. Compute the error value (r + \gamma \max_{a'} Q(s',a'; \theta_i^-) - Q(s,a; \theta_i)). If this error value is within the range [-1;1], square it, if the error value is < -1 multiply by -1, if the error value is > 1 leave it as it is. If you use this as loss function the gradients outside the interval [-1;1] won't vanish.

    In order to have a "smooth-looking" compound loss function you could also replace the squared loss outside the error range [-1;1] with a first-order Taylor approximation at the border values -1 and 1. In this case, if e was your error value, you would square it in case e \in [-1;1], in case e < -1, replace it by -2e-1, in case e > 1, replace it by 2e-1.

    0 讨论(0)
  • 2021-01-02 07:13

    I suspect they mean that you should clip the gradient to [-1,1], not clip the loss function. Thus, you compute the gradient as usual, but then clip each component of the gradient to be in the range [-1,1] (so if it is larger than +1, you replace it with +1; if it is smaller than -1, you replace it with -1); and then you use the result in the gradient descent update step instead of using the unmodified gradient.

    Equivalently: Define a function f as follows:

    f(x) = x^2          if x in [-0.5,0.5]
    f(x) = |x| - 0.25   if x < -0.5 or x > 0.5
    

    Instead of using something of the form s^2 as the loss function (where s is some complicated expression), they suggest to use f(s) as the loss function. This is some kind of hybrid between squared-loss and absolute-value-loss: will behave like s^2 when s is small, but when s gets larger, it will behave like the absolute value (|s|).

    Notice that the derivative of f has the nice property that its derivative will always be in the range [-1,1]:

    f'(x) = 2x    if x in [-0.5,0.5]
    f'(x) = +1    if x > +1
    f'(x) = -1    if x < -1
    

    Thus, when you take the gradient of this f-based loss function, the result will be the same as computing the gradient of a squared-loss and then clipping it.

    Thus, what they're doing is effectively replacing a squared-loss with a Huber loss. The function f is just two times the Huber loss for delta = 0.5.

    Now the point is that the following two alternatives are equivalent:

    • Use a squared loss function. Compute the gradient of this loss function, but the gradient to [-1,1] before doing the update step of the gradient descent.

    • Use a Huber loss function instead of a squared loss function. Compute the gradient of this loss function directly (unchanged) in the gradient descent.

    The former is easy to implement. The latter has nice properties (improves stability; it's better than absolute-value-loss because it avoids oscillating around the minimum). Because the two are equivalent, this means we get an easy-to-implement scheme that has the simplicity of squared-loss with the stability and robustness of the Huber loss.

    0 讨论(0)
  • 2021-01-02 07:13
    1. In the Deep Mind paper you reference, they limit the gradient of the loss. This prevents giant gradients and so improves robustness. They do this by using a quadratic loss function for errors inside a small range, and using an absolute value loss for larger errors.
    2. I suggest implementing the Huber loss function. Below is a python tensorflow implementation.

      def huber_loss(y_true, y_pred, max_grad=1.):
          """Calculates the huber loss.
      
          Parameters
          ----------
          y_true: np.array, tf.Tensor
            Target value.
          y_pred: np.array, tf.Tensor
            Predicted value.
          max_grad: float, optional
            Positive floating point value. Represents the maximum possible
            gradient magnitude.
      
          Returns
          -------
          tf.Tensor
            The huber loss.
          """
          err = tf.abs(y_true - y_pred, name='abs')
          mg = tf.constant(max_grad, name='max_grad')
      
          lin = mg*(err-.5*mg)
          quad=.5*err*err
      
          return tf.where(err < mg, quad, lin)
      
    0 讨论(0)
  • 2021-01-02 07:23

    First of all, the code for the paper is available online, which constitutes an invaluable reference.

    Part 1

    If you take a look at the code you will see that, in nql:getQUpdate (NeuralQLearner.lua, line 180), they clip the error term of the Q-learning function:

    -- delta = r + (1-terminal) * gamma * max_a Q(s2, a) - Q(s, a)
    if self.clip_delta then
        delta[delta:ge(self.clip_delta)] = self.clip_delta
        delta[delta:le(-self.clip_delta)] = -self.clip_delta
    end
    

    Part 2

    In TensorFlow, assuming the last layer of your neural network is called self.output, self.actions is a one-hot encoding of all actions, self.q_targets_ is a placeholder with the targets, and self.q is your computed Q:

    # The loss function
    one = tf.Variable(1.0)
    delta = self.q - self.q_targets_
    absolute_delta = tf.abs(delta)
    delta = tf.where(
        absolute_delta < one,
        tf.square(delta),
        tf.ones_like(delta) # squared error: (-1)^2 = 1
    )
    

    Or, using tf.clip_by_value (and having an implementation closer to the original):

    delta = tf.clip_by_value(
        self.q - self.q_targets_,               
        -1.0,                
        +1.0                 
    )                                                 
    
    0 讨论(0)
提交回复
热议问题