Parallel many dimensional optimization

后端未结

关注

 7  1949

I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the num

相关标签:

7条回答

挽巷

2021-02-07 21:09

Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.

A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.

0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2021-02-07 21:12

I think what you want to do is use the threading capabilities built-in python. Provided you your working function has more or less the same run-time whatever the params, it would be efficient.

Create 8 threads in a pool, run 8 instances of your function, get 8 result, run your optimisation algo to change the params with 8 results, repeat.... profit ?

0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2021-02-07 21:13

You could do parallel at two parts: 1) parallel the calculation of single iteration or 2) parallel start N initial guessing.

On 2) you need a job controller to control the N initial guess discovery threads.

Please add an extra output on your program: "lower bound" that indicates the output values of current input parameter's decents wont lower than this lower bound.

The initial N guessing thread can compete with each other; if any one thread's lower bound is higher than existing thread's current value, then this thread can be dropped by your job controller.

0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2021-02-07 21:15
Parallelizing local optimizers is intrinsically limited: they start from a single initial point and try to work downhill, so later points depend on the values of previous evaluations. Nevertheless there are some avenues where a modest amount of parallelization can be added.
- As another answer points out, if you need to evaluate your derivative using a finite-difference method, preferably with an adaptive step size, this may require many function evaluations, but the derivative with respect to each variable may be independent; you could maybe get a speedup by a factor of twice the number of dimensions of your problem. If you've got more processors than you know what to do with, you can use higher-order-accurate gradient formulae that require more (parallel) evaluations.
- Some algorithms, at certain stages, use finite differences to estimate the Hessian matrix; this requires about half the square of the number of dimensions of your matrix, and all can be done in parallel.
Some algorithms may also be able to use more parallelism at a modest algorithmic cost. For example, quasi-Newton methods try to build an approximation of the Hessian matrix, often updating this by evaluating a gradient. They then take a step towards the minimum and evaluate a new gradient to update the Hessian. If you've got enough processors so that evaluating a Hessian is as fast as evaluating the function once, you could probably improve these by evaluating the Hessian at every step.

As far as implementations go, I'm afraid you're somewhat out of luck. There are a number of clever and/or well-tested implementations out there, but they're all, as far as I know, single-threaded. Your best bet is to use an algorithm that requires a gradient and compute your own in parallel. It's not that hard to write an adaptive one that runs in parallel and chooses sensible step sizes for its numerical derivatives.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2021-02-07 21:16
There are two ways of estimating gradients, one easily parallelizable, one not:
- around a single point, e.g. (f( x + h direction_i ) - f(x)) / h; this is easily parallelizable up to Ndim
- "walking" gradient: walk from x₀ in direction e₀ to x₁, then from x₁ in direction e₁ to x₂ ...; this is sequential.
Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions). The user-supplied gradient function can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method, see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?

I'd suggest scipy fmin_tnc with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw, this compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)
0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2021-02-07 21:27
If I haven't gotten wrong what you are asking, you are trying to minimize your function one parameter at the time.

you can obtain it by creating a set of function of a single argument, where for each function you freeze all the arguments except one.

Then you go on a loop optimizing each variable and updating the partial solution.

This method can speed up by a great deal function of many parameters where the energy landscape is not too complex (the dependency between the parameters is not too strong).

given a function
```
energy(*args) -> value
```
you create the guess and the function:
```
guess = [1,1,1,1]
funcs = [ lambda x,i=i: energy( guess[:i]+[x]+guess[i+1:] ) for i in range(len(guess)) ]
```
than you put them in a while cycle for the optimization
```
while convergence_condition:
    for func in funcs:
        optimize fot func
        update the guess
    check for convergence
```
This is a very simple yet effective method of simplify your minimization task. I can't really recall how this method is called, but A close look to the wikipedia entry on minimization should do the trick.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页