I have a quite big vector. Some of the vector members are matching a certain condition in parallel. I would like to find the first element matching to the condition.
My p
With OpenMP you can try to build a for loop with #pragma omp for schedule(dynamic)
. Each thread will execute one iteration in same order as your vector.
If you want to check 4 elements by thread, try #pragma omp for schedule(dynamic, 4)
Since OpenMP 2.0 does not have cancellation constructs, you have to implement one on your own, e.g., by using a shared variable. It also means that you cannot use the for
worksharing construct as breaking out of parallel loops is not permitted (that's why OpenMP 4.0 introduced cancellation constructs). If you implement cancellation checks between the evaluation of each element, it might happen that two or more threads find elements matching the criterion. Thus, you should perform a min reduction on the index:
int found = 0;
int first_index = INVALID_VALUE;
int iteration = 0;
#pragma omp parallel
{
int my_index = INVALID_VALUE;
int i;
do
{
// Later versions of OpenMP allow for "atomic capture"
// but OpenMP 2.0 requires a critical directive instead
#pragma omp critical(iteration)
{
i = iteration++;
}
if (i < N && check(i))
{
found = 1;
my_index = i;
}
} while (!found && i < N);
#pragma omp critical(reduction)
if (my_index != INVALID_VALUE)
{
if (first_index == INVALID_VALUE || my_index < first_index)
first_index = my_index;
}
// Only needed if more code follows before the end of the region
#pragma omp barrier
...
}
This code assumes that checking the condition for the i-th element (check(i)
) does not alter the state of the element, and therefore, the worst that could happen is that the thread that has found a matching element might have to wait for all other threads to finish checking the element they currently work on and that waiting time will be the maximum of all processing times.
The critical
construct used in the do-loop is expensive. If check()
doesn't take that much time, then you might consider working with chunks instead of iterations:
do
{
#pragma omp critical(chunk)
{
my_chunk = chunk++;
}
if (my_chunk >= N_chunks)
break;
for (i = my_chunk * chunk_size; !found && i < (my_chunk+1)*chunk_size; i++)
{
if (check(i))
{
found = 1;
my_index = i;
break;
}
}
} while (!found && my_chunk < N_chunks);
Another solution that works reasonably well when the number of elements is not that big and checking each one is expensive:
#pragma omp parallel
{
#pragma omp for schedule(dynamic,x)
for (i = 0; i < N; i++)
{
if (!found && check(i))
{
my_index = i;
found = 1;
}
}
// Min reduction from the solution above
...
}
Once found
becomes true, the rest of the loop iterations will run "empty" bodies because the shortcutting properties of &&
.