发表新帖

发表新帖

In simple multi-layer FFNN only ReLU activation function doesn't converge

前端未结

关注

 1  1735

误落风尘 2021-01-27 20:07

I\'m learning tensorflow, deep learning and experimenting various kinds of activation functions.

I created a multi-layer FFNN for the MNIST problem. Mostly based on the

1条回答

鱼传尺愫 (楼主)

2021-01-27 20:59

You are using the Relu activation function that computes the activation as follows,

max(features, 0)

Since it outputs the max value, this sometimes causes the exploding gradient.

Gradientdecnt optimizer update the weight via the following,

∆wij = −η ∂Ei/ ∂wij

where η is the learning rate and ∂Ei/∂wij is the partial derivation of the loss w.r.t weight. When max values gets larger and larger, partial derivations also gets larger and causes the exploding gradient. Therefore, as you can observe in the equation, you need to tune the learning rate (η) to overcome this situation.

A common rule is to reduce the learning rate, usually by a factor of 10 each time.

For your case, set the learning rate = 0.001 and will improve the accuracy.

Hope this helps.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题