I am a newbie to OpenMP, trying to parallelize Jarvis\'s algorithm. However it turns out that the parallel program take 2-3 times longer compare to sequential code.
In the following piece of your code, the whole content of the parallel for
loop is wrapped into a critical
statement. This means that this part of the code will never be entered by more than on thread at a time. Having multiple threads work one at a time will not go faster than if a single thread had gone through all iterations. But on top of that some time is lost in synchronization overhead (each thread must acquire a mutex before entering the critical section and release it afterwards).
int l = 0,i;
#pragma omp parallel shared (n,l) private (i)
{
#pragma omp for
for (i = 1; i < n; i++)
{
#pragma omp critical
{
if (points[i].x < points[l].x)
l = i;
}
}
}
The serial code needs to be somewhat refactored for parallelization. Reduction is often a good approach for simple operations: have each thread compute a partial result on one part of the iterations (e.g. partial minimum, partial sum) than merge all the results into a global one. For supported operations, the #pragma omp for reduction(op:var)
syntax can be used. But in this case, the reduction has to be done manually.
See how the following code relies on local variables to track the index of minimum x
, then uses a single critical section to compute the global minimum index.
int l = 0,i;
#pragma omp parallel shared (n,l) private (i)
{
int l_local = 0; //This variable is private to the thread
#pragma omp for nowait
for (i = 1; i < n; i++)
{
// This part of the code can be executed in parallel
// since all write operations are on thread-local variables
if (points[i].x < points[l_local].x)
l_local = i;
}
// The critical section is entered only once by each thread
#pragma omp critical
{
if (points[l_local].x < points[l].x)
l = l_local;
}
#pragma omp barrier
// a barrier is needed in case some more code follow
// otherwise there is an implicit barrier at the end of the parallel region
}
The same principle should be applied to the second parallel loop, which suffer from the same issue of actually being entirely serialized by the critical
statement.