Diferences between pragmas simd and ivdep vector always?

末鹿安然 提交于 2019-12-06 06:08:31

问题


I am currently trying to vectorize a program and i have observed an odd behaviour

Seems that a for loop is vectorized when using

#pragma simd

(262): (col. 3) remark: SIMD LOOP WAS VECTORIZED.

but it doesn't when i use

#pragma vector always

#pragma ivdep

(262): (col. 3) remark: loop was not vectorized: existence of vector dependence.

I always thought that both sentences do the same vectorization


回答1:


Pragma SIMD is a explicit vectorization tool given to the developer to enforce vectorization as mentioned at https://software.intel.com/en-us/node/514582 while pragma vector is a tool which is used to indicate the compiler that loop should be vectorized based on its arguments. Here the argument is always which means "It should neglect the cost/efficiency heuristics of the compiler and go ahead with vectorization". More information on Pragma vector is available at https://software.intel.com/en-us/node/514586. That doesn't mean pragma simd will produce wrong results when pragma vector always fails to vectorize. When #pragma simd is used with right set of clauses, it can vectorize and still produce right result. Below is a small code snippet which demonstrates that:

   void foo(float *a, float *b, float *c, int N){
   #pragma vector always
   #pragma ivdep
   //#pragma simd vectorlength(2)
   for(int i = 2; i < N; i++)
        a[i] = a[i-2] + b[i] + c[i];
   return;
   }

Compiling this code using ICC will produce the following vectorization report:

$ icc -c -vec-report2 test11.cc
test11.cc(5): (col. 1) remark: loop was not vectorized: existence of vector dependence

By default ICC targets SSE2 which uses 128 bits XMM registers. 4 floats can be accommodated in one XMM register but when you try to accommodate vector of 4 floats, there is a vector dependence. So what #pragma vector always emits is right. But instead of 4, if we consider just 2 floats, we can vectorize this loop without corrupting the results. The vectorization report for the same is shown below:

void foo(float *a, float *b, float *c, int N){
//#pragma vector always
//#pragma ivdep
#pragma simd vectorlength(2)
for(int i = 2; i < N; i++)
        a[i] = a[i-2] + b[i] + c[i];
return;
}
$ icc -c -vec-report2 test11.cc
test11.cc(5): (col. 1) remark: SIMD LOOP WAS VECTORIZED

But #pragma vector doesn't clause which can explicitly specify the vector length to consider while vectoring the loop. This is were pragma simd can really come in handy. When used with right clauses which best explains the computation in vector fashion, the compiler will generate the requested vector which will not generate wrong results. The Intel(R) Cilk(TM) Plus White Paper published at https://software.intel.com/sites/default/files/article/402486/intel-cilk-plus-white-paper.pdf has a section for "Usage of $pragma simd vectorlength clause" and "Usage of $pragma simd reduction and private clause" which explains how to pragma simd clause with right clauses. The clauses help the developer express to the compiler what he wants to achieve and the compiler generates the vector code accordingly. Is it highly recommended to use #pragma simd with relevant clauses wherever needed to best express the loop logic to the compiler.

Also traditionally inner loops are targeted for vectorization but pragma simd can be used for vectorizing outer loops too. More information on this available at https://software.intel.com/en-us/articles/outer-loop-vectorization.




回答2:


pragma simd enforces vectorization of loop, regardless of cost or safety.

pragma vector always tells compiler to ignore efficiency heuristics when deciding to vectorize or not. Code that vectorizes only when this pragma is added might be slower.

pragma ivdep tells compiler to ignore assumed data dependences that inhibit vectorization(for example loop carried dependences), but not proven ones. For example it might assume to pointers aren't pointing to the same memory and vectorize. However, it won't ignore a proven loop carried dependence(a[i] = a[i - 1] * c), but pragma simd might.

A reason your code might have vectorized only with the pragma simd is a proven dependence was ignored. You might want to verify your program output is correct.

Source: Intel specific pragmas documentation(http://software.intel.com/en-us/node/462880)



来源:https://stackoverflow.com/questions/21681300/diferences-between-pragmas-simd-and-ivdep-vector-always

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!