gradient-descent | 易学教程

How to alternate train op's in tensorflow?

阅读更多关于 How to alternate train op's in tensorflow?

问题 I am implementing an alternating training scheme. The graph contains two training ops. The training should alternate between these. This is relevant for research like this or this Below is a small example. But it seems to update both the ops at every step. How can I explicitly alternate between these? from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf # Import data mnist = input_data.read_data_sets('/tmp/tensorflow/mnist/input_data', one_hot=True) # Create the

How to convert deep learning gradient descent equation into python

阅读更多关于 How to convert deep learning gradient descent equation into python

I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please Please see the following link for the equations used Click here to see the equations used for the calculations Following is the function given to calculate the gradient descent,cost etc. The values need to be found without using for loops but using matrix manipulation operations import numpy as np def propagate(w, b, X, Y): ""

Why deep NN can't approximate simple ln(x) function?

阅读更多关于 Why deep NN can't approximate simple ln(x) function?

问题 I have created ANN with two RELU hidden layers + linear activation layer and trying to approximate simple ln(x) function. And I am can't do this good. I am confused because lx(x) in x:[0.0-1.0] range should be approximated without problems (I am using learning rate 0.01 and basic grad descent optimization). import tensorflow as tf import numpy as np def GetTargetResult(x): curY = np.log(x) return curY # Create model def multilayer_perceptron(x, weights, biases): # Hidden layer with RELU

Tensorflow: Convert constant tensor from pre-trained Vgg model to variable

阅读更多关于 Tensorflow: Convert constant tensor from pre-trained Vgg model to variable

My question is how can I convert a constant tensor loaded from a pre-trained Vgg16 model to a tf.Variable tensor? The motivation is that I need to compute the gradient of a specific loss with respect to the Conv4_3 layers' kernel, however, the kernel were seems set to a tf.Constant type and it is not accepted by tf.Optimizer.compute_gradients method. F = vgg.graph.get_tensor_by_name('pretrained_vgg16/conv4_3/filter:0') G = optimizer.compute_gradients(losses, var_list=[F]) # TypeError: Argument is not a tf.Variable: Tensor("pretrained_vgg16/conv4_3/filter:0", shape=(3, 3, 512, 512), dtype

Multi variable gradient descent in matlab

阅读更多关于 Multi variable gradient descent in matlab

I'm doing gradient descent in matlab for mutiple variables, and the code is not getting the expected thetas I got with the normal eq. that are: theta = 1.0e+05 * 3.4041 1.1063 -0.0665 With the Normal eq. I have implemented. And with the GDM the results I get are: theta = 1.0e+05 * 2.6618 -2.6718 -0.5954 And I don't understand why is this, maybe some one can help me and tell me where is the mistake in the code. Code: function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters) m = length(y); % number of training examples J_history = zeros(num_iters, 1); thetas = size(theta

Selection of Mini-batch Size for Neural Network Regression

阅读更多关于 Selection of Mini-batch Size for Neural Network Regression

问题 I am doing a neural network regression with 4 features. How do I determine the size of mini-batch for my problem? I see people use 100 ~ 1000 batch size for computer vision with 32*32*3 features for each image, does that mean I should use batch size of 1 million? I have billions of data and tens of GB of memory so there is no hard requirement for me not to do that. I also observed using a mini-batch with size ~ 1000 makes the convergence much faster than batch size of 1 million. I thought it

How to accumulate gradients in tensorflow?

阅读更多关于 How to accumulate gradients in tensorflow?

I have a question similar to this one . Because I have limited resources and I work with a deep model (VGG-16) - used to train a triplet network - I want to accumulate gradients for 128 batches of size one training example, and then propagate the error and update the weights. It's not clear to me how do I do this. I work with tensorflow but any implementation/pseudocode is welcome. Let's walk through the code proposed in one of the answers you liked to: ## Optimizer definition - nothing different from any classical example opt = tf.train.AdamOptimizer() ## Retrieve all trainable variables you

How to implement mini-batch gradient descent in python?

阅读更多关于 How to implement mini-batch gradient descent in python?

I have just started to learn deep learning. I found myself stuck when it came to gradient descent. I know how to implement batch gradient descent. I know how it works as well how mini-batch and stochastic gradient descent works in theory. But really can't understand how to implement in code. import numpy as np X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ]) y = np.array([[0,1,1,0]]).T alpha,hidden_dim = (0.5,4) synapse_0 = 2*np.random.random((3,hidden_dim)) - 1 synapse_1 = 2*np.random.random((hidden_dim,1)) - 1 for j in xrange(60000): layer_1 = 1/(1+np.exp(-(np.dot(X,synapse_0)))) layer_2 =

Gradient descent and normal equation method for solving linear regression gives different solutions

阅读更多关于 Gradient descent and normal equation method for solving linear regression gives different solutions

I'm working on machine learning problem and want to use linear regression as learning algorithm. I have implemented 2 different methods to find parameters theta of linear regression model: Gradient (steepest) descent and Normal equation. On the same data they should both give approximately equal theta vector. However they do not. Both theta vectors are very similar on all elements but the first one. That is the one used to multiply vector of all 1 added to the data. Here is how the theta s look like (fist column is output of Gradient descent, second output of Normal equation): Grad desc Norm

What are alternatives of Gradient Descent?

阅读更多关于 What are alternatives of Gradient Descent?

问题 Gradient Descent has a problem of Local Minima. We need run gradient descent exponential times for to find global minima. Can anybody tell me about any alternatives of gradient descent with their pros and cons. Thanks. 回答1: This is more a problem to do with the function being minimized than the method used, if finding the true global minimum is important, then use a method such a simulated annealing. This will be able to find the global minimum, but may take a very long time to do so. In the