Should we do learning rate decay for adam optimizer

前端未结

关注

 4  1664

深忆病人 2021-01-29 19:10

I\'m training a network for image localization with Adam optimizer, and someone suggest me to use exponential decay. I don\'t want to try that because Adam optimizer itself deca

4条回答

一整个雨季 (楼主)

2021-01-29 19:31

It depends. ADAM updates any parameter with an individual learning rate. This means that every parameter in the network have a specific learning rate associated.

But the single learning rate for parameter is computed using lambda (the initial learning rate) as upper limit. This means that every single learning rate can vary from 0 (no update) to lambda (maximum update).

The learning rates adapt themselves during train steps, it's true, but if you want to be sure that every update step do not exceed lambda you can than lower lambda using exponential decay or whatever. It can help to reduce loss during the latest step of training, when the computed loss with the previously associated lambda parameter has stopped to decrease.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...