问题
The OpenMP standard specifies an initial value for a reduction variable. So do I have to initialize the variable and how would I do that in the following case:
int sum;
//...
for(int it=0;i<maxIt;i++){
#pragma omp parallel
{
#pragma omp for nowait
for(int i=0;i<ct;i++)
arrayX[i]=arrayY[i];
sum = 0;
#pragma omp for reduction(+:sum)
for(int i=0;i<ct;i++)
sum+=arrayZ[i];
}
//Use sum
}
Note that I use only 1 parallel region to minimize overhead and to allow the nowait in the first loop. Using this as-is would lead to a data race (IMO) because the threads coming from the first loop after other threads started the 2nd loop will reset sum.
Of course I can do this at the top of the outer loop but in a general case and for large code bases you may forget that you need or had set it there which produces unexpected results.
Does "omp single" help here? I suspect that while thread A executes the single, another thread may already enter the reduction loop.
"omp barrier" is possible but I want to avoid that as it defeats the "nowait".
And last another example:
#pragma omp parallel
{
sum = 0;
#pragma omp for reduction(+:sum)
for(int i=0;i<ct;i++)
sum+=arrayZ[i];
//Use sum
sum = 0;
#pragma omp for reduction(+:sum)
for(int i=0;i<ct;i++)
sum+=arrayZ[i];
//Use sum
}
How would I (re)initialize here?
回答1:
Edit: This answer is wrong as it makes an assumption that is not in the OpenMP specification. As accepted answers cannot be deleted, I'm leaving it here as an example that one should always doubt and validate code and/or statements found on the Internet.
Actually, the code doesn't exhibit data races:
#pragma omp parallel
{
...
sum = 0;
#pragma omp for reduction(+:sum)
for(int i=0;i<ct;i++)
sum+=arrayZ[i];
...
}
What happens here is that a private copy of sum
is created inside the worksharing construct and is initialised to 0
(the initialisation value for the +
operator). Each local copy is updated by the loop body. Once a given thread has finished, it waits at the implicit barrier present at the end of the for
construct. Once all threads have reached the barrier, their local copies of sum
are summed together and the result is added to the shared value.
It doesn't matter that all threads might execute sum = 0;
at different time since its value is only updated once the barrier has been reached. Think of the code above performing something like:
...
sum = 0;
// Start of the for worksharing construct
int local_sum = 0; // ^
for(int i = i_start; i < i_end; i++) // | sum not used here
local_sum += arrayZ[i]; // v
// Implicit construct barrier
#pragma omp barrier
// Reduction
#pragma omp atomic update
sum += local_sum;
#pragma omp barrier
// End of the worksharing construct
...
The same applies to the second example.
回答2:
The OpenMP specification does not prescribe when and how the original value gets updated and mandates the use of synchronisation (OpenMP, p.205):
To avoid race conditions, concurrent reads or updates of the original list item must be synchronized with the update of the original list item that occurs as a result of the
reduction
computation.
In both examples, either a barrier
after the assignment to sum
or a single
construct (without nowait
) is needed in order to prevent race conditions.
来源:https://stackoverflow.com/questions/22938901/initialize-variable-for-omp-reduction