问题
I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plan gradient descent and the newton's method?
From Wikipedia, I read this short line "Newton's method uses curvature information to take a more direct route." What does this intuitively mean?
回答1:
At a local minimum (or maximum) x
, the derivative of the target function f
vanishes: f'(x) = 0
(assuming sufficient smoothness of f
).
Gradient descent tries to find such a minimum x
by using information from the first derivative of f
: It simply follows the steepest descent from the current point. This is like rolling a ball down the graph of f
until it comes to rest (while neglecting inertia).
Newton's method tries to find a point x
satisfying f'(x) = 0
by approximating f'
with a linear function g
and then solving for the root of that function explicitely (this is called Newton's root-finding method). The root of g
is not necessarily the root of f'
, but it is under many circumstances a good guess (the Wikipedia article on Newton's method for root finding has more information on convergence criteria). While approximating f'
, Newton's method makes use of f''
(the curvature of f
). This means it has higher requirements on the smoothness of f
, but it also means that (by using more information) it often converges faster.
回答2:
Put simply, gradient descent you just take a small step towards where you think the zero is and then recalculate; Newton's method, you go all the way there.
回答3:
Edit 2017: The original link is dead - but the way back machine still got it :) https://web.archive.org/web/20151122203025/http://www.cs.colostate.edu/~anderson/cs545/Lectures/week6day2/week6day2.pdf
this power point the main ideas are explained simply http://www.cs.colostate.edu/~anderson/cs545/Lectures/week6day2/week6day2.pdf
I hope this help :)
来源:https://stackoverflow.com/questions/12066761/what-is-the-difference-between-gradient-descent-and-newtons-gradient-descent