gradient-descent | 易学教程

Batch size for Stochastic gradient descent is length of training data and not 1?

阅读更多关于 Batch size for Stochastic gradient descent is length of training data and not 1?

问题 I am trying to plot the different learning outcome when using Batch gradient descent, Stochastic gradient descent and mini-batch stochastic gradient descent. Everywhere i look, i read that a batch_size=1 is the same as having a plain SGD and a batch_size=len(train_data) is the same as having the Batch gradient descent. I know that stochastic gradient descent is when you use only one single data sample for every update and batch gradient descent uses the entire training data set to compute the

Understanding accumulated gradients in PyTorch

阅读更多关于 Understanding accumulated gradients in PyTorch

问题 I am trying to comprehend inner workings of the gradient accumulation in PyTorch . My question is somewhat related to these two: Why do we need to call zero_grad() in PyTorch? Why do we need to explicitly call zero_grad()? Comments to the accepted answer to the second question suggest that accumulated gradients can be used if a minibatch is too large to perform a gradient update in a single forward pass, and thus has to be split into multiple sub-batches. Consider the following toy example:

Understanding accumulated gradients in PyTorch

阅读更多关于 Understanding accumulated gradients in PyTorch

Implementing a linear regression using gradient descent

阅读更多关于 Implementing a linear regression using gradient descent

问题 I'm trying to implement a linear regression with gradient descent as explained in this article (https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931). I've followed to the letter the implementation, yet my results overflow after a few iterations. I'm trying to get this result approximately: y = -0.02x + 8499.6. The code: package main import ( "encoding/csv" "fmt" "strconv" "strings" ) const ( iterations = 1000 learningRate = 0.0001 ) func computePrice(m, x, c

Implementing a linear regression using gradient descent

阅读更多关于 Implementing a linear regression using gradient descent

Gradient accumulation in an RNN

阅读更多关于 Gradient accumulation in an RNN

问题 I ran into some memory issues (GPU) when running a large RNN network, but I want to keep my batch size reasonable so I wanted to try out gradient accumulation. In a network where you predict the output in one go, that seems self-evident but in an RNN you do multiple forward passes for each input step. Because of that, I fear that my implementation does not work as intended. I started from user albanD's excellent examples here , but I think they should be modified when using an RNN. The reason

pytorch how to set .requires_grad False

阅读更多关于 pytorch how to set .requires_grad False

来源： https://stackoverflow.com/questions/51748138/pytorch-how-to-set-requires-grad-false

pytorch how to set .requires_grad False

阅读更多关于 pytorch how to set .requires_grad False

来源： https://stackoverflow.com/questions/51748138/pytorch-how-to-set-requires-grad-false

MLP with partial_fit() performing worse than with fit() in a supervised classification

阅读更多关于 MLP with partial_fit() performing worse than with fit() in a supervised classification

来源： https://stackoverflow.com/questions/47665417/mlp-with-partial-fit-performing-worse-than-with-fit-in-a-supervised-classifi

MLP with partial_fit() performing worse than with fit() in a supervised classification

阅读更多关于 MLP with partial_fit() performing worse than with fit() in a supervised classification

来源： https://stackoverflow.com/questions/47665417/mlp-with-partial-fit-performing-worse-than-with-fit-in-a-supervised-classifi