How do you use the pause assembly instruction in 64-bit C++ code?

可紊 提交于 2019-12-03 16:34:14

问题


Since inlined assembly is not supported by VC++ 2010 in 64-bit code, how do I get a pause x86-64 instruction into my code? There does not appear to be an intrinsic for this like there is for many other common assembly instructions (e.g., __rdtsc(), __cpuid(), etc...).

On the why side, I want the instruction to help with a busy wait use case, so that the (hyperthreaded) CPU is available to other threads running on said CPU (See: Performance Insights at intel.com). The pause instruction is very helpful for this use case as well as spin-lock implementations, I can't understand why MS did not include it as an intrinsic.

Thanks


回答1:


Wow, this was a very hard problem to track down, but in case anybody else needs the x86-64 pause instruction:

The YieldProcessor() macro from windows.h expands to the undocumented _mm_pause intrinsic, which ultimately expands to the pause instruction in 32-bit and 64-bit code.

This is completely undocumented, by the way, with partial (and incorrect for VC++ 2010 documentation) for YieldProcessor() appearing in MSDN.

Here is an example of what a block of YieldProcessor() macros compiles into:

    19:     ::YieldProcessor();
000000013FDB18A0 F3 90                pause  
    20:     ::YieldProcessor();
000000013FDB18A2 F3 90                pause  
    21:     ::YieldProcessor();
000000013FDB18A4 F3 90                pause  
    22:     ::YieldProcessor();
000000013FDB18A6 F3 90                pause  
    23:     ::YieldProcessor();
000000013FDB18A8 F3 90                pause  

By the way, each pause instruction seems to produce about a 9 cycle delay on the Nehalem architecture, on the average (i.e., 3 ns on a 3.3 GHz CPU).




回答2:


The _mm_pause() intrinsic is fully documented by Intel and supported by all the major x86 compilers portably across OSes. IDK if MS's docs were lacking in the past, or if you just missed it ~7 years go.

#include <immintrin.h> and use it. (Or for ancient compilers #include <emmintrin.h> for SSE2).

#include <immintrin.h>

void test() {
    _mm_pause();
    _mm_pause();
}

compiles to this asm on all 4 of gcc/clang/ICC/MSVC (on the Godbolt compiler explorer):

test():                               # @test()
    pause
    pause
    ret

On CPUs without SSE2, it decodes as rep nop which is just a nop. Cross-platform implementation of the x86 pause instruction

Gcc even knows this, and still accepts _mm_pause() when compiling with -mno-sse. (Normally gcc and clang reject intriniscs for instructions that aren't enabled, unlike MSVC.) Amusingly, gcc even emits rep nop in its asm output, while the other three emit pause. They assemble to same machine code, of course.


Pause idles the front-end of that hyperthread for about 5 cycles on Sandybridge-family until Skylake. On Skylake, Intel increased it to ~100 cycles to save more power in spin-wait loops.

See also What is the purpose of the "PAUSE" instruction in x86?.



来源:https://stackoverflow.com/questions/5833527/how-do-you-use-the-pause-assembly-instruction-in-64-bit-c-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!