What are alternatives of Gradient Descent?

前端未结

关注

 5  1509

失恋的感觉 2021-01-31 22:19

Gradient Descent has a problem of Local Minima. We need run gradient descent exponential times for to find global minima.

Can anybody tell me about any alternatives of

5条回答

太阳男子 (楼主)

2021-01-31 23:01
See my masters thesis for a very similar list:

Optimization algorithms for neural networks
- Gradient based
  - Flavours of gradient descent (only first order gradient):
    - Stochastic gradient descent:
    - Mini-Batch gradient descent:
    - Learning Rate Scheduling:
      
      Momentum:
      
      RProp and the mini-batch version RMSProp
      
      AdaGrad
      
      Adadelta (paper)
      
      Exponential Decay Learning Rate
      
      Performance Scheduling
      
      Newbob Scheduling
    - Quickprop
    - Nesterov Accelerated Gradient (NAG): Explanation
  - Higher order gradients
    - Newton's method: Typically not possible
    - Quasi-Newton method
      
      BFGS
      
      L-BFGS
  - Unsure how it works
    - Adam (Adaptive Moment Estimation)
      
      AdaMax
    - Conjugate gradient
- Alternatives
  - Genetic algorithms
  - Simulated Annealing
  - Twiddle
  - Markov random fields (graphcut/mincut)
You might also want to have a look at my article about optimization basics and at Alec Radfords nice gifs: 1 and 2, e.g.

Other interesting resources are:
- An overview of gradient descent optimization algorithms
Trade-Offs

I think all of the posted optimization algorithms have some scenarios where they have advantages. The general trade-offs are:
- How much of an improvement do you get in one step?
- How fast can you calculate one step?
- How much data can the algorithm deal with?
- Is it guaranteed to find a local minimum?
- What requirements does the optimization algorithm have for your function? (e.g. to be once, twice or three times differentiable)
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

What are alternatives of Gradient Descent?

Optimization algorithms for neural networks

Trade-Offs