Consider following Program calculating Fibonacci Numbers.
It uses OpenMP Tasks for parallelisation.
#include
#include
us
With OMP_NESTED=FALSE, a team of threads is assigned to the top-level parallel region, and no extra threads at each nested level, so at most two threads will be doing useful work.
With OMP_NESTED=TRUE, a team of threads is assigned at each level. On your system there are 8 logical CPUs, so the team size is likely 8. The team includes one thread from outside the region, so only 7 new threads are launched. The recursion tree for fib(n) has about fib(n) nodes. (A nice self-referential property of fib!) Thus the code might create 7*fib(n) threads, which can quickly exhaust resources.
The fix is to use a single parallel region around the entire task tree. Move the omp parallel
and omp single
logic to main, outside of fib. That way a single thread team will work on the entire task tree.
The general point is to distinguish potential parallelism from actual parallelism. The task directives specify potential parallelism, which might or might not actually be used during an execution. An omp parallel
(for all practical purposes) specifies actual parallelism. Usually you want the actual parallelism to match the available hardware, so as not to swamp the machine, but have the potential parallelism be much larger, so that the run-time can balance load.