What are alternatives of Gradient Descent?

前端 未结 5 1509
失恋的感觉
失恋的感觉 2021-01-31 22:19

Gradient Descent has a problem of Local Minima. We need run gradient descent exponential times for to find global minima.

Can anybody tell me about any alternatives of

5条回答
  •  太阳男子
    2021-01-31 23:01

    See my masters thesis for a very similar list:

    Optimization algorithms for neural networks

    • Gradient based
      • Flavours of gradient descent (only first order gradient):
        • Stochastic gradient descent:
        • Mini-Batch gradient descent:
        • Learning Rate Scheduling:
          • Momentum:
          • RProp and the mini-batch version RMSProp
          • AdaGrad
          • Adadelta (paper)
          • Exponential Decay Learning Rate
          • Performance Scheduling
          • Newbob Scheduling
        • Quickprop
        • Nesterov Accelerated Gradient (NAG): Explanation
      • Higher order gradients
        • Newton's method: Typically not possible
        • Quasi-Newton method
          • BFGS
          • L-BFGS
      • Unsure how it works
        • Adam (Adaptive Moment Estimation)
          • AdaMax
        • Conjugate gradient
    • Alternatives
      • Genetic algorithms
      • Simulated Annealing
      • Twiddle
      • Markov random fields (graphcut/mincut)

    You might also want to have a look at my article about optimization basics and at Alec Radfords nice gifs: 1 and 2, e.g.

    Other interesting resources are:

    • An overview of gradient descent optimization algorithms

    Trade-Offs

    I think all of the posted optimization algorithms have some scenarios where they have advantages. The general trade-offs are:

    • How much of an improvement do you get in one step?
    • How fast can you calculate one step?
    • How much data can the algorithm deal with?
    • Is it guaranteed to find a local minimum?
    • What requirements does the optimization algorithm have for your function? (e.g. to be once, twice or three times differentiable)

提交回复
热议问题