gradient-descent | 易学教程

LMS batch gradient descent with NumPy

阅读更多关于 LMS batch gradient descent with NumPy

问题 I'm trying to write some very simple LMS batch gradient descent but I believe I'm doing something wrong with the gradient. The ratio between the order of magnitude and the initial values for theta is very different for the elements of theta so either theta[2] doesn't move (e.g. if alpha = 1e-8 ) or theta[1] shoots off (e.g. if alpha = .01 ). import numpy as np y = np.array([[400], [330], [369], [232], [540]]) x = np.array([[2104,3], [1600,3], [2400,3], [1416,2], [3000,4]]) x = np.concatenate(

Why does the Logistic Regression cost go negative and not correct?

阅读更多关于 Why does the Logistic Regression cost go negative and not correct?

问题 I am implementing logistic regression in Matlab. The data is normalized (mean and std). I understand that depending on your learning rate you may overshoot the optimal point. But doesn't that mean your cost starts going up? In my case the cost goes into negative territory, I don't understand why. Here is the standard (I think?) cost and weight update rule function J = crossEntropyError(w, x, y) h = sigmoid(x*w); J = (-y'*log(h) - (1-y')*log(1-h)); end Weight update: function w = updateWeights

Tensorflow 2.0 doesn't compute the gradient

阅读更多关于 Tensorflow 2.0 doesn't compute the gradient

问题 I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convolutional layer, choose the feature map and find the gradients with the respect to the input. The idea is to change the input in such a way that will maximize the activation of the desired feature map. Using tensorflow 2.0 I have a GradientTape that follows the function and then computes the gradient,

calculate gradient output for Theta update rule

阅读更多关于 calculate gradient output for Theta update rule

问题 As this uses a sigmoid function instead of a zero/one activation function I guess this is the right way to calculate gradient descent, is that right? static double calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size ) { //double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3]; double sum = 0.0; for (int i = 0; i < globo_dict_size; i++) { sum += ( weights[i] * feature_matrix[file_index][i] ); } //bias sum += weights[

Trying to understand code that computes the gradient wrt to the input for LogSoftMax in Torch

阅读更多关于 Trying to understand code that computes the gradient wrt to the input for LogSoftMax in Torch

问题 Code comes from: https://github.com/torch/nn/blob/master/lib/THNN/generic/LogSoftMax.c I don't see how this code is computing the gradient w.r.t to the input for the module LogSoftMax. What I'm confused about is what the two for loops are doing. for (t = 0; t < nframe; t++) { sum = 0; gradInput_data = gradInput_data0 + dim*t; output_data = output_data0 + dim*t; gradOutput_data = gradOutput_data0 + dim*t; for (d = 0; d < dim; d++) sum += gradOutput_data[d]; for (d = 0; d < dim; d++) gradInput

How does one use Hermite polynomials with Stochastic Gradient Descent (SGD)?

阅读更多关于 How does one use Hermite polynomials with Stochastic Gradient Descent (SGD)?

问题 I was trying to train a simple polynomial linear model with pytorch using Hermite polynomials since they seem to have a better conditioned Hessian. To do that I decided to use the hermvander since it gives the Vandermonde matrix with each entry being a Hermite term. To do that I just made my feature vectors be the outpute of hermvander: Kern_train = hermvander(X_train,Degree_mdl) however, when I proceeded to train I get NaN all the time. I suspected it could have been a step size issue but I

setting an array element with a sequence error in scikit learn GradientBoostingClassifier

阅读更多关于 setting an array element with a sequence error in scikit learn GradientBoostingClassifier

Here is my code, anyone have any ideas what is wrong? The error happens when I call fit , import pandas as pd import numpy as np from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier, GradientBoostingClassifier) from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer n_estimators = 10 d = {'f1': [1, 2], 'f2': ['foo goo', 'goo zoo'], 'target':[0, 1]} df = pd.DataFrame(data=d) X_train, X_test, y_train, y_test = train_test_split(df, df['target'], test_size=0.1) X_train['f2'] = CountVectorizer().fit_transform(X_train['f2

gradient descent as applied to feature vector bag of words classification task

阅读更多关于 gradient descent as applied to feature vector bag of words classification task

问题 I've watched the Andrew Ng videos over and over and still I don't understand how to apply gradient descent to my problem. He deals pretty much exclusively in the realm of high level conceptual explanations but what I need are ground level tactical insights. My input are feature vectors of the form: Example: Document 1 = ["I", "am", "awesome"] Document 2 = ["I", "am", "great", "great"] Dictionary is: ["I", "am", "awesome", "great"] So the documents as a vector would look like: Document 1 = [1,

Tensorflow gradient is always zero

阅读更多关于 Tensorflow gradient is always zero

问题 I have written a small Tensorflow program which convolves an image patch by the same convolution kernel num_unrollings times in a row, and then attempts to minimize the mean squared difference between the resulting values and a target output. However, when I run the model with num_unrollings greater than 1, the gradient of my my loss ( tf_loss ) term with respect to the convolution kernel ( tf_kernel ) is zero, so no learning occurs. Here is the smallest code (python 3) I can come up with

Linear regression gradient descent algorithms in R produce varying results

阅读更多关于 Linear regression gradient descent algorithms in R produce varying results

问题 I am trying to implement a linear regression in R from scratch without using any packages or libraries using the following data: UCI Machine Learning Repository, Bike-Sharing-Dataset The linear regression was easy enough, here is the code: data <- read.csv("Bike-Sharing-Dataset/hour.csv") # Select the useable features data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")] # Split the data trainingObs<-sample