问题
Furthermore, how does the compiler determine the extent to unroll a loop, assuming all operations in the loop are completely independent of other iterations.
回答1:
For MSVC there is only a vector independence hint: http://msdn.microsoft.com/en-us/library/hh923901.aspx
#pragma loop( ivdep )
For many other compilers, like Intel/ibm, there a several pragma hints for optimizing a loop:
#pragma unroll
#pragma loop count N
#pragma ivdep
There is a thread with MSVC++ people about unroll heuristic: http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/d0b225c2-f5b0-4bb9-ac6a-4d4f61f7cb17/
VC tries to balance execution speed and code size. You can change the balance by using flags /O1 or /O2, but even when optimzing for speed VC tries to conserve code size as well.
Basically, unroll will increase code size, so it may be limited in Os and O1 modes (modes table)
PS: Pragma looks like preprocessor directive, but it is not. It is a directive for compiler and it it ignored (kept) by preprocessor.
回答2:
In the case of Intel Compiler:
#pragma loop count N helps the compiler to use the best strategy in order to vectorize the loop. It saves time So, we can say it helps to drive the loop unrolling. Examples:
#pragma loop_count min(n),max(n),avg(n)
#pragma unroll (n) works only when used with -O3 flag, you can use the following strategy to unroll your loop according to target processor.
Besides the increased code generated by loop unrolling, it may worth, since the compiler will produce loop's version for scalar operations as well for vector operations.
In cases where unrolling is affecting performance, for instance: loop with 20 iterations with vector length 16, results in 1 loop that executes 16 operations at once and a remainder loop that executes 4 sequentially. To avoid remainder loop generated by the compiler we can use before the loop:
#pragma vector novecremainder //or -mP2OPT_hpo_vec_peel = F to disable peel and remainder loops (compiler internal option)
or
#pragma nounroll //where unrolling is not worth at all
Just to clarify the #pragma ivdep :
- It gives specific hints to modify compiler heuristics about dependencies and must be used only when we know that the assumed dependencies are safe to ignore.
- Most important, it overrides potential dependencies, but the compiler still performs a dependence analysis, try #pragma simd to vectorize regardless any analysis.
Hope this helps.
来源:https://stackoverflow.com/questions/12830450/are-there-any-preprocessor-directives-that-control-loop-unrolling