gradient-descent

Gradient Descent vs Stochastic Gradient Descent algorithms

允我心安 提交于 2020-01-01 05:49:05
问题 I tried to train a FeedForward Neural Network on the MNIST Handwritten Digits dataset (includes 60K training samples). I each time iterated over all the training samples , performing Backpropagation for each such sample on every epoch. The runtime is of course too long. Is the algorithm I ran named Gradient Descent ? I read that for large datasets, using Stochastic Gradient Descent can improve the runtime dramatically. What should I do in order to use Stochastic Gradient Descent ? Should I

Why do we need to explicitly call zero_grad()? [duplicate]

北慕城南 提交于 2019-12-29 12:11:52
问题 This question already has an answer here : Why do we need to call zero_grad() in PyTorch? (1 answer) Closed last month . Why do we need to explicitly zero the gradients in PyTorch? Why can't gradients be zeroed when loss.backward() is called? What scenario is served by keeping the gradients on the graph and asking the user to explicitly zero the gradients? 回答1: We explicitly need to call zero_grad() because, after loss.backward() (when gradients are computed), we need to use optimizer.step()

Machine learning - Linear regression using batch gradient descent

大兔子大兔子 提交于 2019-12-28 03:32:16
问题 I am trying to implement batch gradient descent on a data set with a single feature and multiple training examples ( m ). When I try using the normal equation, I get the right answer but the wrong one with this code below which performs batch gradient descent in MATLAB. function [theta] = gradientDescent(X, y, theta, alpha, iterations) m = length(y); delta=zeros(2,1); for iter =1:1:iterations for i=1:1:m delta(1,1)= delta(1,1)+( X(i,:)*theta - y(i,1)) ; delta(2,1)=delta(2,1)+ (( X(i,:)*theta

Cost function in logistic regression gives NaN as a result

被刻印的时光 ゝ 提交于 2019-12-28 02:50:27
问题 I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function: t = 1 ./ (1 + exp(-z)); where z = x*theta And I am using the following cost function to calculate cost, to determine when to stop training. function cost = computeCost(x, y, theta) htheta = sigmoid(x*theta); cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));

TensorFlow : optimizer gives nan as ouput

最后都变了- 提交于 2019-12-24 17:07:50
问题 I am running a very simple tensorflow program W = tf.Variable([.3],tf.float32) b = tf.Variable([-.3],tf.float32) x = tf.placeholder(tf.float32) linear_model = W*x + b y = tf.placeholder(tf.float32) squared_error = tf.square(linear_model - y) loss = tf.reduce_sum(squared_error) optimizer = tf.train.GradientDescentOptimizer(0.1) train = optimizer.minimize(loss) init = tf.global_variables_initializer() with tf.Session() as s: file_writer = tf.summary.FileWriter('../../tfLogs/graph',s.graph) s

Does Stochastic Gradient Descent even work with TensorFlow?

老子叫甜甜 提交于 2019-12-23 16:26:56
问题 I designed a MLP, fully connected, with 2 hidden and one output layer. I get a nice learning curve if I use batch or mini-batch gradient descent. But a straight line while performing Stochastic Gradient Descent (violet) What did I get wrong? In my understanding, I do stochastic gradient descent with Tensorflow, if I provide just one train/learn example each train step, like: X = tf.placeholder("float", [None, amountInput],name="Input") Y = tf.placeholder("float", [None, amountOutput],name=

How to get around in place operation error if index leaf variable for gradient update?

假如想象 提交于 2019-12-23 12:28:05
问题 I am encountering In place operation error when I am trying to index a leaf variable to update gradients with customized Shrink function. I cannot work around it. Any help is highly appreciated! import torch.nn as nn import torch import numpy as np from torch.autograd import Variable, Function # hyper parameters batch_size = 100 # batch size of images ld = 0.2 # sparse penalty lr = 0.1 # learning rate x = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,10,10))), requires_grad=False

Octave code for gradient descent using vectorization not updating cost function correctly

倾然丶 夕夏残阳落幕 提交于 2019-12-23 02:38:08
问题 I have implemented following code for gradient descent using vectorization but it seems the cost function is not decrementing correctly.Instead the cost function is increasing with each iteration. Assuming theta to be an n+1 vector, y to be a m vector and X to be design matrix m*(n+1) function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); % number of training examples n = length(theta); % number of features J_history = zeros(num_iters, 1); error = ((theta'

How can I have multiple losses in a network in Caffe?

风流意气都作罢 提交于 2019-12-22 14:53:10
问题 If I define multiple loss layers in a network, will there be multiple back propagation happening from those ends to the beginning of the network? I mean, do they even work that way? Suppose I have something like this: Layer1{ } Layer2{ } ... Layer_n{ } Layer_cls1{ bottom:layer_n top:cls1 } Layer_cls_loss1{ type:some_loss bottom:cls1 top:loss1 } Layer_n1{ bottom:layer_n .. } Layer_n2{ } ... layer_n3{ } Layer_cls2{ bottom:layer_n3 top:cls2 } Layer_cls_loss2{ type:some_loss bottom:cls2 top:loss2

How to convert deep learning gradient descent equation into python

对着背影说爱祢 提交于 2019-12-22 01:05:11
问题 I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please Please see the following link for the equations used Click here to see the equations used for the calculations Following is the function given to calculate the gradient descent,cost etc. The values need to be found without