Instruction-Level-Parallelism Exploration

前端 未结 5 1664
谎友^
谎友^ 2021-02-14 18:13

I am just wondering if there are any usefuls tools out there that allow me to exploit the Instruction-Level-Parallelism in some algorithms. More specifically, I have a subset of

5条回答
  •  生来不讨喜
    2021-02-14 18:31

    If I read you correctly, you are not interested in SIMD or threads, just getting optimal ordering of normal CPU instructions.

    The first thing to check is if your compiler is targeting the correct CPU subtype. The compiler will usually reorder instructions to reduce dependencies from one instruction to another, but it is vital for the compiler to know specifically which version of the CPU you're targeting. (specifically older GCC sometimes fails to detect recent CPUs and then optimizes for i386).

    Second thing you can do is checking your compiler inlining decisions (by looking at the assembler). Inlining small functions in algorithms can increase the code size but will improve the amount of opportunity for the compiler to optimize, as multiple calculations can be done in paralell. I often resort to forced inlining.

    Lastly, for intel cpu's, Intel's own C++ compiler claims to be best at this. They also have the vTune profiler that specifically can report efficient use of the ALU's in your program's hotspots.

提交回复
热议问题