In simple multi-layer FFNN only ReLU activation function doesn't converge

前端 未结 1 1736
误落风尘
误落风尘 2021-01-27 20:07

I\'m learning tensorflow, deep learning and experimenting various kinds of activation functions.

I created a multi-layer FFNN for the MNIST problem. Mostly based on the

相关标签:
1条回答
  • 2021-01-27 20:59

    You are using the Relu activation function that computes the activation as follows,

    max(features, 0)

    Since it outputs the max value, this sometimes causes the exploding gradient.

    Gradientdecnt optimizer update the weight via the following,

    ∆wij = −η ∂Ei/ ∂wij

    where η is the learning rate and ∂Ei/∂wij is the partial derivation of the loss w.r.t weight. When max values gets larger and larger, partial derivations also gets larger and causes the exploding gradient. Therefore, as you can observe in the equation, you need to tune the learning rate (η) to overcome this situation.

    A common rule is to reduce the learning rate, usually by a factor of 10 each time.

    For your case, set the learning rate = 0.001 and will improve the accuracy.

    Hope this helps.

    0 讨论(0)
提交回复
热议问题