How to determine the learning rate and the variance in a gradient descent algorithm?

爱⌒轻易说出口 提交于 2019-12-03 02:39:36

Plotting is the best way to see how your algorithm is performing. To see if you have achieved convergence you can plot the evolution of the cost function after each iteration, after a certain given of iteration you will see that it does not improve much you can assume convergence, take a look to the following code:

cost_f = []
while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):

    theta1_last = theta1_guess
    theta0_last = theta0_guess

    hypothesis = create_hypothesis(theta1_guess, theta0_guess)
    cost_f.append((1./(2*m))*sum([ pow(hypothesis(point[0]) - point[1], 2) for point in data]))

    theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] for point in data])
    theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data])   

import pylab
pylab.plot(range(len(cost_f)), cost_f)
pylab.show()

Which will plot the following graphic (execution with learning_rate=0.01, variance=0.00001)

As you can see, after a thousand iteration you don't get much improvement. I normally declare convergence if the cost function decreases less than 0.001 in one iteration, but this just based on my own experience.

For choosing learning rate, the best thing you can do is also plot the cost function and see how it is performing, and always remember these two things:

  • if the learning rate is too small you will get slow convergence
  • if the learning rate is too large your cost function may not decrease in every iteration and therefore it will not converge

If you run your code choosing learning_rate > 0.029 and variance=0.001 you will be in the second case, gradient descent doesn't converge, while if you choose values learning_rate < 0.0001, variance=0.001 you will see that your algorithm takes a lot iteration to converge.

Not convergence example with learning_rate=0.03

Slow convergence example with learning_rate=0.0001

There are a bunch of ways to guarantee convergence of a gradient descent algorithm. There is line search, a fixed step size related to the Lipschitz constant of the gradient (that is in case of a function. In case of a table such as yours, you can make the difference between consecutive values), a decreasing step size for each iteration and some others. Some of them can be found here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!