Intel compiler (C++) issue with OpenMP reduction on std::vector

后端 未结 1 1538
后悔当初
后悔当初 2021-01-19 00:02

Since OpenMP 4.0, user-defined reduction is supported. So I defined the reduction on std::vector in C++ exactly from here. It works fine with GNU/5.4.0 and GNU/6.4.0, but it

1条回答
  •  抹茶落季
    2021-01-19 00:38

    This appears to be a bug in the Intel compiler, I can reliably reproduce it with a C example not involving vectors:

    #include 
    
    void my_sum_fun(int* outp, int* inp) {
        printf("%d @ %p += %d @ %p\n", *outp, outp, *inp, inp);
        *outp = *outp + *inp;
    }
    
    int my_init(int* orig) {
        printf("orig: %d @ %p\n", *orig, orig);
        return *orig;
    }
    
    #pragma omp declare reduction(my_sum : int : my_sum_fun(&omp_out, &omp_in) initializer(omp_priv = my_init(&omp_orig))
    
    int main()
    {   
        int s = 0;
        #pragma omp parallel for reduction(my_sum : s)
        for (int i = 0; i < 2; i++)
            s+= 1;
    
        printf("sum: %d\n", s);
    }
    

    Output:

    orig: 0 @ 0x7ffee43ccc80
    0 @ 0x7ffee43ccc80 += 1 @ 0x7ffee43cc780
    orig: 1 @ 0x7ffee43ccc80
    1 @ 0x7ffee43ccc80 += 2 @ 0x2b56d095ca80
    sum: 3
    

    It applies the reduction operation to the original variable before initializing the private copy from the original value. This leads to the wrong result.

    You can manually add a barrier as a workaround:

    #pragma omp parallel reduction(vec_double_plus : w)
    {
      #pragma omp for
      for (int i = 0; i < 4; ++i)
        for (int j = 0; j < w.size(); ++j)
          w[j] += 1;
      #pragma omp barrier
    }
    

    0 讨论(0)
提交回复
热议问题