发表新帖

发表新帖

Why do we need to explicitly call zero_grad()? [duplicate]

后端未结

关注

 4  1546

一向 2020-12-13 00:10

4条回答

醉梦人生 (楼主)

2020-12-13 00:39

I have a use case for the current setup in PyTorch.

If one is using a recurrent neural network (RNN) that is making predictions at every step, one might want to have a hyperparameter that allows one to accumulate gradients back in time. Not zeroing the gradients at every time step allows for one to use back-propagating through time (BPTT) in interesting and novel ways.

If you would like more info on BPTT or RNNs see the article Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients or The Unreasonable Effectiveness of Recurrent Neural Networks.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题