gradient-descent | 易学教程

Multi-layer neural network back-propagation formula (using stochastic gradient descent)

阅读更多关于 Multi-layer neural network back-propagation formula (using stochastic gradient descent)

问题 Using the notations from Backpropagation calculus | Deep learning, chapter 4, I have this back-propagation code for a 4-layer (i.e. 2 hidden layers) neural network: def sigmoid_prime(z): return z * (1-z) # because σ'(x) = σ(x) (1 - σ(x)) def train(self, input_vector, target_vector): a = np.array(input_vector, ndmin=2).T y = np.array(target_vector, ndmin=2).T # forward A = [a] for k in range(3): a = sigmoid(np.dot(self.weights[k], a)) # zero bias here just for simplicity A.append(a) # Now A

setting an array element with a sequence error in scikit learn GradientBoostingClassifier

阅读更多关于 setting an array element with a sequence error in scikit learn GradientBoostingClassifier

问题 Here is my code, anyone have any ideas what is wrong? The error happens when I call fit , import pandas as pd import numpy as np from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier, GradientBoostingClassifier) from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer n_estimators = 10 d = {'f1': [1, 2], 'f2': ['foo goo', 'goo zoo'], 'target':[0, 1]} df = pd.DataFrame(data=d) X_train, X_test, y_train, y_test = train

Estimating linear regression with Gradient Descent (Steepest Descent)

阅读更多关于 Estimating linear regression with Gradient Descent (Steepest Descent)

问题 Example data X<-matrix(c(rep(1,97),runif(97)) , nrow=97, ncol=2) y<-matrix(runif(97), nrow= 97 , ncol =1) I have succeed in creating the cost function COST<-function(theta,X,y){ ### Calculate half MSE sum((X %*% theta - y)^2)/(2*length(y)) } How ever when I run this function , it seem to fail to converge over 100 iterations. theta <- matrix (0, nrow=2,ncol=1) num.iters <- 1500 delta = 0 GD<-function(X,y,theta,alpha,num.iters){ for (i in num.iters){ while (max(abs(delta)) < tolerance){ error <

How do I switch tf.train.Optimizers during training?

阅读更多关于 How do I switch tf.train.Optimizers during training?

问题 I want to switch from Adam to SGD after a certain number of epochs. How do I do this smoothly so that the weights/gradients are passed over to the new optimizer? 回答1: Just define two optimizers and switch between them: sgd_optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) adap_optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost) ... for epoch in range(100): for (x, y) in zip(train_X, train_Y): optimizer = sgd_optimizer if epoch > 50 else adap_optimizer

Tensorflow gradient with respect to matrix

阅读更多关于 Tensorflow gradient with respect to matrix

问题 Just for context, I'm trying to implement a gradient descent algorithm with Tensorflow. I have a matrix X [ x1 x2 x3 x4 ] [ x5 x6 x7 x8 ] which I multiply by some feature vector Y to get Z [ y1 ] Z = X [ y2 ] = [ z1 ] [ y3 ] [ z2 ] [ y4 ] I then put Z through a softmax function, and take the log. I'll refer to the output matrix as W. All this is implemented as follows (little bit of boilerplate added so it's runnable) sess = tf.Session() num_features = 4 num_actions = 2 policy_matrix = tf.get

Simple gradient descent using mxnet

阅读更多关于 Simple gradient descent using mxnet

问题 I'm trying to use MXNet's gradient descent optimizers to minimize a function. The equivalent example in Tensorflow would be: import tensorflow as tf x = tf.Variable(2, name='x', dtype=tf.float32) log_x = tf.log(x) log_x_squared = tf.square(log_x) optimizer = tf.train.GradientDescentOptimizer(0.5) train = optimizer.minimize(log_x_squared) init = tf.initialize_all_variables() def optimize(): with tf.Session() as session: session.run(init) print("starting at", "x:", session.run(x), "log(x)^2:",

`warm_start` Parameter And Its Impact On Computational Time

阅读更多关于 `warm_start` Parameter And Its Impact On Computational Time

问题 I have a logistic regression model with a defined set of parameters ( warm_start=True ). As always, I call LogisticRegression.fit(X_train, y_train) and use the model after to predict new outcomes. Suppose I alter some parameters, say, C=100 and call .fit method again using the same training data . Theoretically, for the second time, I think .fit should take less computational time as compared to the model with warm_start=False . However, empirically is not actually true. Please, help me

TensorFlow's ReluGrad claims input is not finite

阅读更多关于 TensorFlow's ReluGrad claims input is not finite

问题 I'm trying out TensorFlow and I'm running into a strange error. I edited the deep MNIST example to use another set of images, and the algorithm converges nicely again, until around iteration 8000 (accuracy 91% at that point) when it crashes with the following error. tensorflow.python.framework.errors.InvalidArgumentError: ReluGrad input is not finite At first I thought maybe some coefficients were reaching the limit for a float, but adding l2 regularization on all weights & biases didn't

What's different about momentum gradient update in Tensorflow and Theano like this?

阅读更多关于 What's different about momentum gradient update in Tensorflow and Theano like this?

问题 I'm trying to use TensorFlow with my deep learning project. Here I need implement my gradient update in this formula : I have also implement this part in Theano, and it came out the expected answer. But when I try to use TensorFlow's MomentumOptimizer , the result is really bad. I don't know what is different between them. Theano: def gradient_updates_momentum_L2(cost, params, learning_rate, momentum, weight_cost_strength): # Make sure momentum is a sane value assert momentum < 1 and momentum

Multi-layer neural network back-propagation formula (using stochastic gradient descent)

阅读更多关于 Multi-layer neural network back-propagation formula (using stochastic gradient descent)

Using the notations from Backpropagation calculus | Deep learning, chapter 4 , I have this back-propagation code for a 4-layer (i.e. 2 hidden layers) neural network: def sigmoid_prime(z): return z * (1-z) # because σ'(x) = σ(x) (1 - σ(x)) def train(self, input_vector, target_vector): a = np.array(input_vector, ndmin=2).T y = np.array(target_vector, ndmin=2).T # forward A = [a] for k in range(3): a = sigmoid(np.dot(self.weights[k], a)) # zero bias here just for simplicity A.append(a) # Now A has 4 elements: the input vector + the 3 outputs vectors # back-propagation delta = a - y for k in [2, 1