In simple multi-layer FFNN only ReLU activation function doesn't converge

三世轮回 提交于 2019-12-02 11:47:01

You are using the Relu activation function that computes the activation as follows,

max(features, 0)

Since it outputs the max value, this sometimes causes the exploding gradient.

Gradientdecnt optimizer update the weight via the following,

∆wij = −η ∂Ei/ ∂wij

where η is the learning rate and ∂Ei/∂wij is the partial derivation of the loss w.r.t weight. When max values gets larger and larger, partial derivations also gets larger and causes the exploding gradient. Therefore, as you can observe in the equation, you need to tune the learning rate (η) to overcome this situation.

A common rule is to reduce the learning rate, usually by a factor of 10 each time.

For your case, set the learning rate = 0.001 and will improve the accuracy.

Hope this helps.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!