gradient-descent

LMS batch gradient descent with NumPy

你。 提交于 2019-12-11 09:45:33
问题 I'm trying to write some very simple LMS batch gradient descent but I believe I'm doing something wrong with the gradient. The ratio between the order of magnitude and the initial values for theta is very different for the elements of theta so either theta[2] doesn't move (e.g. if alpha = 1e-8 ) or theta[1] shoots off (e.g. if alpha = .01 ). import numpy as np y = np.array([[400], [330], [369], [232], [540]]) x = np.array([[2104,3], [1600,3], [2400,3], [1416,2], [3000,4]]) x = np.concatenate(

Why does the Logistic Regression cost go negative and not correct?

无人久伴 提交于 2019-12-11 05:27:43
问题 I am implementing logistic regression in Matlab. The data is normalized (mean and std). I understand that depending on your learning rate you may overshoot the optimal point. But doesn't that mean your cost starts going up? In my case the cost goes into negative territory, I don't understand why. Here is the standard (I think?) cost and weight update rule function J = crossEntropyError(w, x, y) h = sigmoid(x*w); J = (-y'*log(h) - (1-y')*log(1-h)); end Weight update: function w = updateWeights

Tensorflow 2.0 doesn't compute the gradient

拟墨画扇 提交于 2019-12-11 04:06:29
问题 I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convolutional layer, choose the feature map and find the gradients with the respect to the input. The idea is to change the input in such a way that will maximize the activation of the desired feature map. Using tensorflow 2.0 I have a GradientTape that follows the function and then computes the gradient,

calculate gradient output for Theta update rule

我与影子孤独终老i 提交于 2019-12-11 02:20:48
问题 As this uses a sigmoid function instead of a zero/one activation function I guess this is the right way to calculate gradient descent, is that right? static double calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size ) { //double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3]; double sum = 0.0; for (int i = 0; i < globo_dict_size; i++) { sum += ( weights[i] * feature_matrix[file_index][i] ); } //bias sum += weights[

Trying to understand code that computes the gradient wrt to the input for LogSoftMax in Torch

空扰寡人 提交于 2019-12-10 12:02:18
问题 Code comes from: https://github.com/torch/nn/blob/master/lib/THNN/generic/LogSoftMax.c I don't see how this code is computing the gradient w.r.t to the input for the module LogSoftMax. What I'm confused about is what the two for loops are doing. for (t = 0; t < nframe; t++) { sum = 0; gradInput_data = gradInput_data0 + dim*t; output_data = output_data0 + dim*t; gradOutput_data = gradOutput_data0 + dim*t; for (d = 0; d < dim; d++) sum += gradOutput_data[d]; for (d = 0; d < dim; d++) gradInput

How does one use Hermite polynomials with Stochastic Gradient Descent (SGD)?

会有一股神秘感。 提交于 2019-12-09 23:13:03
问题 I was trying to train a simple polynomial linear model with pytorch using Hermite polynomials since they seem to have a better conditioned Hessian. To do that I decided to use the hermvander since it gives the Vandermonde matrix with each entry being a Hermite term. To do that I just made my feature vectors be the outpute of hermvander: Kern_train = hermvander(X_train,Degree_mdl) however, when I proceeded to train I get NaN all the time. I suspected it could have been a step size issue but I

setting an array element with a sequence error in scikit learn GradientBoostingClassifier

南楼画角 提交于 2019-12-09 04:38:26
Here is my code, anyone have any ideas what is wrong? The error happens when I call fit , import pandas as pd import numpy as np from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier, GradientBoostingClassifier) from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer n_estimators = 10 d = {'f1': [1, 2], 'f2': ['foo goo', 'goo zoo'], 'target':[0, 1]} df = pd.DataFrame(data=d) X_train, X_test, y_train, y_test = train_test_split(df, df['target'], test_size=0.1) X_train['f2'] = CountVectorizer().fit_transform(X_train['f2

gradient descent as applied to feature vector bag of words classification task

本小妞迷上赌 提交于 2019-12-08 09:25:38
问题 I've watched the Andrew Ng videos over and over and still I don't understand how to apply gradient descent to my problem. He deals pretty much exclusively in the realm of high level conceptual explanations but what I need are ground level tactical insights. My input are feature vectors of the form: Example: Document 1 = ["I", "am", "awesome"] Document 2 = ["I", "am", "great", "great"] Dictionary is: ["I", "am", "awesome", "great"] So the documents as a vector would look like: Document 1 = [1,

Tensorflow gradient is always zero

时间秒杀一切 提交于 2019-12-08 08:56:09
问题 I have written a small Tensorflow program which convolves an image patch by the same convolution kernel num_unrollings times in a row, and then attempts to minimize the mean squared difference between the resulting values and a target output. However, when I run the model with num_unrollings greater than 1, the gradient of my my loss ( tf_loss ) term with respect to the convolution kernel ( tf_kernel ) is zero, so no learning occurs. Here is the smallest code (python 3) I can come up with

Linear regression gradient descent algorithms in R produce varying results

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-08 07:51:20
问题 I am trying to implement a linear regression in R from scratch without using any packages or libraries using the following data: UCI Machine Learning Repository, Bike-Sharing-Dataset The linear regression was easy enough, here is the code: data <- read.csv("Bike-Sharing-Dataset/hour.csv") # Select the useable features data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")] # Split the data trainingObs<-sample