Understanding the collapse clause in openmp

前端 未结 2 338
遥遥无期
遥遥无期 2020-11-30 04:44

I came across an OpenMP code that had the collapse clause, which was new to me. I\'m trying to understand what it means, but I don\'t think I have fully grasped it\'s implic

相关标签:
2条回答
  • 2020-11-30 05:11

    If your purpose is balancing the load over increasing rows, assuming the workload for each item is regular or well scattered, then how about folding the row indices in half, and forgetting about the collapse clause?

    #pragma omp for
    for (int iy0=0; iy0<n; ++iy0){
      int iy = iy0;
      if (iy0 >= n/2) iy = n-1 -iy0 +n/2;
      for (int ix=iy+1; ix<n; ++ix){
        work(ix, iy);
      }
    }
    
    0 讨论(0)
  • 2020-11-30 05:17

    The problem with your code is that the iterations of the inner loop depend on the outer loop. According to the OpenMP specification under the description of the section on binding and the collapse clause:

    If execution of any associated loop changes any of the values used to compute any of the iteration counts, then the behavior is unspecified.

    You can use collapse when this is not the case for example with a square loop

    #pragma omp parallel for private(j) collapse(2)
    for (i = 0; i < 4; i++)
        for (j = 0; j < 100; j++)
    

    In fact this is a good example to show when to use collapse. The outer loop only has four iterations. If you have more than four threads then some will be wasted. But when you collapse the threads will distribute among 400 iterations which is likely to be much greater than the number of threads. Another reason to use collapse is if the load is not well distributed. If you only used four iterations and the fourth iteration took most of the time the other threads wait. But if you use 400 iterations the load is likely to be better distributed.

    You can fuse a loop by hand for the code above like this

    #pragma omp parallel for
    for(int n=0; n<4*100; n++) {
        int i = n/100; int j=n%100;
    

    Here is an example showing how to fuse a triply fused loop by hand.

    Finally, here is an example showing how to fuse a triangular loop which collapse is not defined for.


    Here is a solution that maps a rectangular loop to the triangular loop in the OPs question. This can be used to fuse the OPs triangular loop.

    //int n = 4;
    for(int k=0; k<n*(n+1)/2; k++) {
        int i = k/(n+1), j = k%(n+1);
        if(j>i) i = n - i -1, j = n - j;
        printf("(%d,%d)\n", i,j);
    }
    

    This works for any value of n.

    The map for the OPs question goes from

    (0,0),
    (1,0), (1,1),
    (2,0), (2,1), (2,2),
    (3,0), (3,1), (3,2), (3,3),
    

    to

    (0,0), (3,3), (3,2), (3,1), (3,0),
    (1,0), (1,1), (2,2), (2,1), (2,0),
    

    For odd values of n the map is not exactly a rectangle but the formula still works.

    For example n = 3 gets mapped from

    (0,0),
    (1,0), (1,1),
    (2,0), (2,1), (2,2),
    

    to

    (0,0), (2,2), (2,1), (2,0),
    (1,0), (1,1),
    

    Here is code to test this

    #include <stdio.h>
    int main(void) {
        int n = 4;
        for(int i=0; i<n; i++) {
            for(int j=0; j<=i; j++) {
                printf("(%d,%d)\n", i,j);
            }
        }
        puts("");
        for(int k=0; k<n*(n+1)/2; k++) {
            int i = k/(n+1), j = k%(n+1);
            if(j>i) i = n - i - 1, j = n - j;
            printf("(%d,%d)\n", i,j);
        }
    }
    
    0 讨论(0)
提交回复
热议问题