gradient-descent

Batch size for Stochastic gradient descent is length of training data and not 1?

让人想犯罪 __ 提交于 2021-02-15 07:10:25
问题 I am trying to plot the different learning outcome when using Batch gradient descent, Stochastic gradient descent and mini-batch stochastic gradient descent. Everywhere i look, i read that a batch_size=1 is the same as having a plain SGD and a batch_size=len(train_data) is the same as having the Batch gradient descent. I know that stochastic gradient descent is when you use only one single data sample for every update and batch gradient descent uses the entire training data set to compute the

Understanding accumulated gradients in PyTorch

感情迁移 提交于 2021-02-05 20:34:09
问题 I am trying to comprehend inner workings of the gradient accumulation in PyTorch . My question is somewhat related to these two: Why do we need to call zero_grad() in PyTorch? Why do we need to explicitly call zero_grad()? Comments to the accepted answer to the second question suggest that accumulated gradients can be used if a minibatch is too large to perform a gradient update in a single forward pass, and thus has to be split into multiple sub-batches. Consider the following toy example:

Understanding accumulated gradients in PyTorch

╄→尐↘猪︶ㄣ 提交于 2021-02-05 20:33:14
问题 I am trying to comprehend inner workings of the gradient accumulation in PyTorch . My question is somewhat related to these two: Why do we need to call zero_grad() in PyTorch? Why do we need to explicitly call zero_grad()? Comments to the accepted answer to the second question suggest that accumulated gradients can be used if a minibatch is too large to perform a gradient update in a single forward pass, and thus has to be split into multiple sub-batches. Consider the following toy example:

Implementing a linear regression using gradient descent

被刻印的时光 ゝ 提交于 2021-01-29 20:00:35
问题 I'm trying to implement a linear regression with gradient descent as explained in this article (https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931). I've followed to the letter the implementation, yet my results overflow after a few iterations. I'm trying to get this result approximately: y = -0.02x + 8499.6. The code: package main import ( "encoding/csv" "fmt" "strconv" "strings" ) const ( iterations = 1000 learningRate = 0.0001 ) func computePrice(m, x, c

Implementing a linear regression using gradient descent

故事扮演 提交于 2021-01-29 15:44:41
问题 I'm trying to implement a linear regression with gradient descent as explained in this article (https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931). I've followed to the letter the implementation, yet my results overflow after a few iterations. I'm trying to get this result approximately: y = -0.02x + 8499.6. The code: package main import ( "encoding/csv" "fmt" "strconv" "strings" ) const ( iterations = 1000 learningRate = 0.0001 ) func computePrice(m, x, c

Gradient accumulation in an RNN

点点圈 提交于 2021-01-28 01:52:36
问题 I ran into some memory issues (GPU) when running a large RNN network, but I want to keep my batch size reasonable so I wanted to try out gradient accumulation. In a network where you predict the output in one go, that seems self-evident but in an RNN you do multiple forward passes for each input step. Because of that, I fear that my implementation does not work as intended. I started from user albanD's excellent examples here , but I think they should be modified when using an RNN. The reason