SciPy optimisation: Newton-CG vs BFGS vs L-BFGS

后端 未结 1 1598
旧时难觅i
旧时难觅i 2021-02-05 22:44

I am doing an optimisation problem using Scipy, where I am taking a flat network of vertices and bonds of size NNxNN, connecting two sides of it (i.e., making it pe

1条回答
  •  庸人自扰
    2021-02-05 23:01

    Your question is missing two important information: The energy function and the initial guess. The energy function can be convex/non-convex, smooth/piecewise-smooth/discontinuous. For this reason, it makes it hard to fully answer your question in your context. However, I can explain some key differences between BFGS and L-BFGS-B.

    Both methods are iterative methods for solving nonlinear optimization problems. They both approximate the Newton method by using an approximation of the Hessian of the function at every iteration. The key difference with the Newton method is that instead of computing the full Hessian at a specific point, they accumulate the gradients at previous points and use the BFGS formula to put them together as an approximation of the Hessian. Newton and BFGS methods are not guaranteed to converge unless the function has a quadratic Taylor expansion near an optimum.

    The original BFGS method accumulates all gradients since the given initial guess. There is two problems with this method. First, the memory can increase indefinitely. Second, for nonlinear problems, the Hessian at the initial guess is often not representative of the Hessian at the solution. The approximated Hessian will thus be biased until enough gradients are accumulated close to the solution. This can slow down convergence, but should, in my experience, still converge with a good line search algorithm for energy functions that have a single local minimum.

    L-BFGS is the same as BFGS but with a limited-memory, which means that after some time, old gradients are discarded to leave more space for freshly computed gradients. This solves the problem of the memory, and it avoids the bias of the initial gradient. However, depending on the number of gradients kept in memory, the Hessian might never be precisely estimated, and can be another source of bias. This can also slow down convergence, but again, it should still converge with a good line search algorithm for energy functions that have a single local minimum.

    L-BFGS-B is the same as L-BFGS but with bound constraints on the input variables. L-BFGS-B will stop optimizing variables that are on the boundary of the domain. Since you did not specify any constraints, this aspect of the algorithm does not apply to your problem.

    My hypothesis is that you are trying to solve a smooth but non-convex problem using an initial guess that is far from the solution, and that you end up in a local minimum. Since you mentioned that you start from a flat configuration, I would not be surprised that you start in a singularity that leads to a degenerate Hessian, which can cause troubles for the rest of the optimization. The only difference between BFGS and L-BFGS in your case is that every iteration will compute a gradient that is slightly different, and that the L-BFGS method will end up following a path that leads to the global minimum.

    0 讨论(0)
提交回复
热议问题