Should we do learning rate decay for adam optimizer

前端 未结 4 1663
深忆病人
深忆病人 2021-01-29 19:10

I\'m training a network for image localization with Adam optimizer, and someone suggest me to use exponential decay. I don\'t want to try that because Adam optimizer itself deca

4条回答
  •  孤独总比滥情好
    2021-01-29 19:45

    Yes, absolutely. From my own experience, it's very useful to Adam with learning rate decay. Without decay, you have to set a very small learning rate so the loss won't begin to diverge after decrease to a point. Here, I post the code to use Adam with learning rate decay using TensorFlow. Hope it is helpful to someone.

    decayed_lr = tf.train.exponential_decay(learning_rate,
                                            global_step, 10000,
                                            0.95, staircase=True)
    opt = tf.train.AdamOptimizer(decayed_lr, epsilon=adam_epsilon)
    

提交回复
热议问题