问题
What is the proper way to parallelize a multi-dimensional embarrassingly parallel loop in OpenMP? The number of dimensions is known at compile-time, but which dimensions will be large is not. Any of them may be one, two, or a million. Surely I don't want N omp parallel
's for an N-dimensional loop...
Thoughts:
The problem is conceptually simple. Only the outermost 'large' loop needs to be parallelized, but the loop dimensions are unknown at compile-time and may change.
Will dynamically setting
omp_set_num_threads(1)
and#pragma omp for schedule(static, huge_number)
make certain loop parallelizations a no-op? Will this have undesired side-effects/overhead? Feels like a kludge.The OpenMP Specification (2.10, A.38, A.39) tells the difference between conforming and non-conforming nested parallelism, but doesn't suggest the best approach to this problem.
Re-ordering the loops is possible but may result in a lot of cache-misses. Unrolling is possible but non-trivial. Is there another way?
Here's what I'd like to parallelize:
for(i0=0; i0<n[0]; i0++) {
for(i1=0; i1<n[1]; i1++) {
...
for(iN=0; iN<n[N]; iN++) {
<embarrasingly parallel operations>
}
...
}
}
Thanks!
回答1:
The collapse
directive is probably what you're looking for, as described here. This will essentially form a single loop, which is then parallized, and is designed for exactly these sorts of situations. So you'd do:
#pragma omp parallel for collapse(N)
for(int i0=0; i0<n[0]; i0++) {
for(int i1=0; i1<n[1]; i1++) {
...
for(int iN=0; iN<n[N]; iN++) {
<embarrasingly parallel operations>
}
...
}
}
and be all set.
来源:https://stackoverflow.com/questions/5287321/multi-dimensional-nested-openmp-loop