OpenMP C program run slower than sequential code

前端 未结 1 981
既然无缘
既然无缘 2021-01-17 07:15

I am a newbie to OpenMP, trying to parallelize Jarvis\'s algorithm. However it turns out that the parallel program take 2-3 times longer compare to sequential code.

相关标签:
1条回答
  • 2021-01-17 07:51

    In the following piece of your code, the whole content of the parallel for loop is wrapped into a critical statement. This means that this part of the code will never be entered by more than on thread at a time. Having multiple threads work one at a time will not go faster than if a single thread had gone through all iterations. But on top of that some time is lost in synchronization overhead (each thread must acquire a mutex before entering the critical section and release it afterwards).

    int l = 0,i;
    #pragma omp parallel shared (n,l) private (i)
    {
        #pragma omp for
        for (i = 1; i < n; i++)
        {
            #pragma omp critical
            {
                if (points[i].x < points[l].x)
                l = i;
            }
        }
    }
    

    The serial code needs to be somewhat refactored for parallelization. Reduction is often a good approach for simple operations: have each thread compute a partial result on one part of the iterations (e.g. partial minimum, partial sum) than merge all the results into a global one. For supported operations, the #pragma omp for reduction(op:var) syntax can be used. But in this case, the reduction has to be done manually.

    See how the following code relies on local variables to track the index of minimum x, then uses a single critical section to compute the global minimum index.

    int l = 0,i;
    #pragma omp parallel shared (n,l) private (i)
    {
        int l_local = 0; //This variable is private to the thread
    
        #pragma omp for nowait
        for (i = 1; i < n; i++)
        {
            // This part of the code can be executed in parallel
            // since all write operations are on thread-local variables
            if (points[i].x < points[l_local].x)
                l_local = i;
        }
    
        // The critical section is entered only once by each thread
        #pragma omp critical
        {
        if (points[l_local].x < points[l].x)
            l = l_local;
        }
    
        #pragma omp barrier
        // a barrier is needed in case some more code follow
        // otherwise there is an implicit barrier at the end of the parallel region
    }
    

    The same principle should be applied to the second parallel loop, which suffer from the same issue of actually being entirely serialized by the critical statement.

    0 讨论(0)
提交回复
热议问题