gradient-descent | 易学教程

How can I have multiple losses in a network in Caffe?

阅读更多关于 How can I have multiple losses in a network in Caffe?

If I define multiple loss layers in a network, will there be multiple back propagation happening from those ends to the beginning of the network? I mean, do they even work that way? Suppose I have something like this: Layer1{ } Layer2{ } ... Layer_n{ } Layer_cls1{ bottom:layer_n top:cls1 } Layer_cls_loss1{ type:some_loss bottom:cls1 top:loss1 } Layer_n1{ bottom:layer_n .. } Layer_n2{ } ... layer_n3{ } Layer_cls2{ bottom:layer_n3 top:cls2 } Layer_cls_loss2{ type:some_loss bottom:cls2 top:loss2 } layer_n4{ bottom:layer_n3 .. } ... layer_cls3End{ top:cls_end bottom:... } loss{ bottom:cls_end top

AdamOptimizer and GradientDescentOptimizer from tensorflow not able to fit simple data

阅读更多关于 AdamOptimizer and GradientDescentOptimizer from tensorflow not able to fit simple data

问题 Similar question: Here I am trying out TensorFlow. I generated simple data which is linearly separable and tried to fit a linear equation to it. Here is the code. np.random.seed(2010) n = 300 x_data = np.random.random([n, 2]).tolist() y_data = [[1., 0.] if v[0]> 0.5 else [0., 1.] for v in x_data] x = tf.placeholder(tf.float32, [None, 2]) W = tf.Variable(tf.zeros([2, 2])) b = tf.Variable(tf.zeros([2])) y = tf.sigmoid(tf.matmul(x , W) + b) y_ = tf.placeholder(tf.float32, [None, 2]) cross

Multi variable gradient descent in matlab

阅读更多关于 Multi variable gradient descent in matlab

问题 I'm doing gradient descent in matlab for mutiple variables, and the code is not getting the expected thetas I got with the normal eq. that are: theta = 1.0e+05 * 3.4041 1.1063 -0.0665 With the Normal eq. I have implemented. And with the GDM the results I get are: theta = 1.0e+05 * 2.6618 -2.6718 -0.5954 And I don't understand why is this, maybe some one can help me and tell me where is the mistake in the code. Code: function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num

Implementing gradient descent in TensorFlow instead of using the one provided with it

阅读更多关于 Implementing gradient descent in TensorFlow instead of using the one provided with it

问题 I want to use gradient descent with momentum (keep track of previous gradients) while building a classifier in TensorFlow. So I don't want to use tensorflow.train.GradientDescentOptimizer but I want to use tensorflow.gradients to calculate gradients and keep track of previous gradients and update the weights based on all of them. How do I do this in TensorFlow? 回答1: TensorFlow has an implementation of gradient descent with momentum. To answer your general question about implementing your own

Tensorflow gradient with respect to matrix

阅读更多关于 Tensorflow gradient with respect to matrix

Just for context, I'm trying to implement a gradient descent algorithm with Tensorflow. I have a matrix X [ x1 x2 x3 x4 ] [ x5 x6 x7 x8 ] which I multiply by some feature vector Y to get Z [ y1 ] Z = X [ y2 ] = [ z1 ] [ y3 ] [ z2 ] [ y4 ] I then put Z through a softmax function, and take the log. I'll refer to the output matrix as W. All this is implemented as follows (little bit of boilerplate added so it's runnable) sess = tf.Session() num_features = 4 num_actions = 2 policy_matrix = tf.get_variable("params", (num_actions, num_features)) state_ph = tf.placeholder("float", (num_features, 1))

How do I switch tf.train.Optimizers during training?

阅读更多关于 How do I switch tf.train.Optimizers during training?

I want to switch from Adam to SGD after a certain number of epochs. How do I do this smoothly so that the weights/gradients are passed over to the new optimizer? Just define two optimizers and switch between them: sgd_optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) adap_optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost) ... for epoch in range(100): for (x, y) in zip(train_X, train_Y): optimizer = sgd_optimizer if epoch > 50 else adap_optimizer sess.run(optimizer, feed_dict={X: x, Y: y}) An optimizer only encapsulates the way to apply the gradients to

Simple gradient descent using mxnet

阅读更多关于 Simple gradient descent using mxnet

I'm trying to use MXNet's gradient descent optimizers to minimize a function. The equivalent example in Tensorflow would be: import tensorflow as tf x = tf.Variable(2, name='x', dtype=tf.float32) log_x = tf.log(x) log_x_squared = tf.square(log_x) optimizer = tf.train.GradientDescentOptimizer(0.5) train = optimizer.minimize(log_x_squared) init = tf.initialize_all_variables() def optimize(): with tf.Session() as session: session.run(init) print("starting at", "x:", session.run(x), "log(x)^2:", session.run(log_x_squared)) for step in range(10): session.run(train) print("step", step, "x:", session

How to accumulate gradients in tensorflow?

阅读更多关于 How to accumulate gradients in tensorflow?

问题 I have a question similar to this one. Because I have limited resources and I work with a deep model (VGG-16) - used to train a triplet network - I want to accumulate gradients for 128 batches of size one training example, and then propagate the error and update the weights. It's not clear to me how do I do this. I work with tensorflow but any implementation/pseudocode is welcome. 回答1: Let's walk through the code proposed in one of the answers you liked to: ## Optimizer definition - nothing

What's different about momentum gradient update in Tensorflow and Theano like this?

阅读更多关于 What's different about momentum gradient update in Tensorflow and Theano like this?

I'm trying to use TensorFlow with my deep learning project. Here I need implement my gradient update in this formula : I have also implement this part in Theano, and it came out the expected answer. But when I try to use TensorFlow's MomentumOptimizer , the result is really bad. I don't know what is different between them. Theano: def gradient_updates_momentum_L2(cost, params, learning_rate, momentum, weight_cost_strength): # Make sure momentum is a sane value assert momentum < 1 and momentum >= 0 # List of update steps for each parameter updates = [] # Just gradient descent on cost for param

TensorFlow's ReluGrad claims input is not finite

阅读更多关于 TensorFlow's ReluGrad claims input is not finite

I'm trying out TensorFlow and I'm running into a strange error. I edited the deep MNIST example to use another set of images, and the algorithm converges nicely again, until around iteration 8000 (accuracy 91% at that point) when it crashes with the following error. tensorflow.python.framework.errors.InvalidArgumentError: ReluGrad input is not finite At first I thought maybe some coefficients were reaching the limit for a float, but adding l2 regularization on all weights & biases didn't resolve the issue. It's always the first relu application that comes out of the stacktrace: h_conv1 = tf.nn