gradient-descent | 易学教程

Gradient Descent vs Stochastic Gradient Descent algorithms

阅读更多关于 Gradient Descent vs Stochastic Gradient Descent algorithms

问题 I tried to train a FeedForward Neural Network on the MNIST Handwritten Digits dataset (includes 60K training samples). I each time iterated over all the training samples , performing Backpropagation for each such sample on every epoch. The runtime is of course too long. Is the algorithm I ran named Gradient Descent ? I read that for large datasets, using Stochastic Gradient Descent can improve the runtime dramatically. What should I do in order to use Stochastic Gradient Descent ? Should I

Why do we need to explicitly call zero_grad()? [duplicate]

阅读更多关于 Why do we need to explicitly call zero_grad()? [duplicate]

问题 This question already has an answer here : Why do we need to call zero_grad() in PyTorch? (1 answer) Closed last month . Why do we need to explicitly zero the gradients in PyTorch? Why can't gradients be zeroed when loss.backward() is called? What scenario is served by keeping the gradients on the graph and asking the user to explicitly zero the gradients? 回答1: We explicitly need to call zero_grad() because, after loss.backward() (when gradients are computed), we need to use optimizer.step()

Machine learning - Linear regression using batch gradient descent

阅读更多关于 Machine learning - Linear regression using batch gradient descent

问题 I am trying to implement batch gradient descent on a data set with a single feature and multiple training examples ( m ). When I try using the normal equation, I get the right answer but the wrong one with this code below which performs batch gradient descent in MATLAB. function [theta] = gradientDescent(X, y, theta, alpha, iterations) m = length(y); delta=zeros(2,1); for iter =1:1:iterations for i=1:1:m delta(1,1)= delta(1,1)+( X(i,:)*theta - y(i,1)) ; delta(2,1)=delta(2,1)+ (( X(i,:)*theta

Cost function in logistic regression gives NaN as a result

阅读更多关于 Cost function in logistic regression gives NaN as a result

问题 I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function: t = 1 ./ (1 + exp(-z)); where z = x*theta And I am using the following cost function to calculate cost, to determine when to stop training. function cost = computeCost(x, y, theta) htheta = sigmoid(x*theta); cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));

TensorFlow : optimizer gives nan as ouput

阅读更多关于 TensorFlow : optimizer gives nan as ouput

问题 I am running a very simple tensorflow program W = tf.Variable([.3],tf.float32) b = tf.Variable([-.3],tf.float32) x = tf.placeholder(tf.float32) linear_model = W*x + b y = tf.placeholder(tf.float32) squared_error = tf.square(linear_model - y) loss = tf.reduce_sum(squared_error) optimizer = tf.train.GradientDescentOptimizer(0.1) train = optimizer.minimize(loss) init = tf.global_variables_initializer() with tf.Session() as s: file_writer = tf.summary.FileWriter('../../tfLogs/graph',s.graph) s

Does Stochastic Gradient Descent even work with TensorFlow?

阅读更多关于 Does Stochastic Gradient Descent even work with TensorFlow?

问题 I designed a MLP, fully connected, with 2 hidden and one output layer. I get a nice learning curve if I use batch or mini-batch gradient descent. But a straight line while performing Stochastic Gradient Descent (violet) What did I get wrong? In my understanding, I do stochastic gradient descent with Tensorflow, if I provide just one train/learn example each train step, like: X = tf.placeholder("float", [None, amountInput],name="Input") Y = tf.placeholder("float", [None, amountOutput],name=

How to get around in place operation error if index leaf variable for gradient update?

阅读更多关于 How to get around in place operation error if index leaf variable for gradient update?

问题 I am encountering In place operation error when I am trying to index a leaf variable to update gradients with customized Shrink function. I cannot work around it. Any help is highly appreciated! import torch.nn as nn import torch import numpy as np from torch.autograd import Variable, Function # hyper parameters batch_size = 100 # batch size of images ld = 0.2 # sparse penalty lr = 0.1 # learning rate x = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,10,10))), requires_grad=False

Octave code for gradient descent using vectorization not updating cost function correctly

阅读更多关于 Octave code for gradient descent using vectorization not updating cost function correctly

问题 I have implemented following code for gradient descent using vectorization but it seems the cost function is not decrementing correctly.Instead the cost function is increasing with each iteration. Assuming theta to be an n+1 vector, y to be a m vector and X to be design matrix m*(n+1) function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); % number of training examples n = length(theta); % number of features J_history = zeros(num_iters, 1); error = ((theta'

How can I have multiple losses in a network in Caffe?

阅读更多关于 How can I have multiple losses in a network in Caffe?

问题 If I define multiple loss layers in a network, will there be multiple back propagation happening from those ends to the beginning of the network? I mean, do they even work that way? Suppose I have something like this: Layer1{ } Layer2{ } ... Layer_n{ } Layer_cls1{ bottom:layer_n top:cls1 } Layer_cls_loss1{ type:some_loss bottom:cls1 top:loss1 } Layer_n1{ bottom:layer_n .. } Layer_n2{ } ... layer_n3{ } Layer_cls2{ bottom:layer_n3 top:cls2 } Layer_cls_loss2{ type:some_loss bottom:cls2 top:loss2

How to convert deep learning gradient descent equation into python

阅读更多关于 How to convert deep learning gradient descent equation into python

问题 I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please Please see the following link for the equations used Click here to see the equations used for the calculations Following is the function given to calculate the gradient descent,cost etc. The values need to be found without