How should the learning rate change as the batch size change?

前端 未结 2 1124
孤城傲影
孤城傲影 2021-01-29 23:49

When I increase/decrease batch size of the mini-batch used in SGD, should I change learning rate? If so, then how?

For reference, I was discussing with someone, and it w

2条回答
  •  不思量自难忘°
    2021-01-30 00:01

    Apart from the papers mentioned in Dmytro's answer, you can refer to the article of: Jastrzębski, S., Kenton, Z., Arpit, D., Ballas, N., Fischer, A., Bengio, Y., & Storkey, A. (2018, October). Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio. The authors give the mathematical and empirical foundation to the idea that the ratio of learning rate to batch size influences the generalization capacity of DNN. They show that this ratio plays a major role in the width of the minima found by SGD. The higher ratio the wider is minima and better generalization.

提交回复
热议问题