Octave code for gradient descent using vectorization not updating cost function correctly

问题

I have implemented following code for gradient descent using vectorization but it seems the cost function is not decrementing correctly.Instead the cost function is increasing with each iteration.

Assuming theta to be an n+1 vector, y to be a m vector and X to be design matrix m*(n+1)

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples
n = length(theta); % number of features
J_history = zeros(num_iters, 1);
error = ((theta' * X')' - y)*(alpha/m);
descent = zeros(size(theta),1);

for iter = 1:num_iters
for i = 1:n
   descent(i) = descent(i) + sum(error.* X(:,i));
   i = i + 1;
end 

theta = theta - descent;
J_history(iter) = computeCost(X, y, theta);
disp("the value of cost function is : "), disp(J_history(iter));
iter = iter + 1;
end

The compute cost function is :

function J = computeCost(X, y, theta)
m = length(y);
J = 0;
for i = 1:m,
   H = theta' * X(i,:)';
   E = H - y(i);
   SQE = E^2;
   J = (J + SQE);
   i = i+1;
end;
J = J / (2*m);

回答1:

You can vectorise it even further:

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
    m = length(y); 
    J_history = zeros(num_iters, 1);

    for iter = 1:num_iters

       delta = (theta' * X'-y')*X;
       theta = theta - alpha/m*delta';
       J_history(iter) = computeCost(X, y, theta);

    end

end

回答2:

You can vectorize it better as follows

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
  m = length(y);
  J_history = zeros(num_iters, 1);

  for iter = 1:num_iters

     theta=theta-(alpha/m)*((X*theta-y)'*X)';
     J_history(iter) = computeCost(X, y, theta);

  end;
end;

The ComputeCost function can be written as

function J = computeCost(X, y, theta)
  m = length(y); 

  J = 1/(2*m)*sum((X*theta-y)^2);

end;

来源：https://stackoverflow.com/questions/26656640/octave-code-for-gradient-descent-using-vectorization-not-updating-cost-function

标签

machine-learning

octave

vectorization

gradient-descent