Multiple parameter optimization with lots of local minima

后端 未结 5 900
栀梦
栀梦 2021-02-04 11:47

I\'m looking for algorithms to find a \"best\" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse,

相关标签:
5条回答
  • 2021-02-04 12:18

    I can't really help you with finding an algorithm for your specific problem.

    However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.

    If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.

    0 讨论(0)
  • 2021-02-04 12:20

    How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?

    Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.

    • Simulated Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
    • Particle Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
    • Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.

    The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.

    Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.

    By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.

    0 讨论(0)
  • 2021-02-04 12:20

    Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.

    The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.

    Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.

    Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.

    0 讨论(0)
  • 2021-02-04 12:23

    I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).

    I've also tried an algorithm that does the following:

    • Pick a random point and a random direction
    • Evaluate the function
    • Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
    • When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.

    This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.

    If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.

    This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.

    Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.

    0 讨论(0)
  • 2021-02-04 12:32

    There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.

    Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:

    • get 100 computers to cut your parameter testing time to some reasonable time
    • really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
    • for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
    • then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.
    0 讨论(0)
提交回复
热议问题