What is the correct way to implement gradient accumulation in pytorch?

后端 未结 0 415
独厮守ぢ
独厮守ぢ 2021-02-12 15:17

Broadly there are two ways:

  1. Call loss.backward() on every batch, but only call optimizer.step() and optimizer.zero_grad()

相关标签:
回答
  • 消灭零回复
提交回复
热议问题