问题
Since inlined assembly is not supported by VC++ 2010 in 64-bit code, how do I get a pause
x86-64 instruction into my code? There does not appear to be an intrinsic for this like there is for many other common assembly instructions (e.g., __rdtsc()
, __cpuid()
, etc...).
On the why side, I want the instruction to help with a busy wait use case, so that the (hyperthreaded) CPU is available to other threads running on said CPU (See: Performance Insights at intel.com). The pause
instruction is very helpful for this use case as well as spin-lock implementations, I can't understand why MS did not include it as an intrinsic.
Thanks
回答1:
Wow, this was a very hard problem to track down, but in case anybody else needs the x86-64 pause
instruction:
The YieldProcessor()
macro from windows.h
expands to the undocumented _mm_pause
intrinsic, which ultimately expands to the pause
instruction in 32-bit and 64-bit code.
This is completely undocumented, by the way, with partial (and incorrect for VC++ 2010 documentation) for YieldProcessor() appearing in MSDN.
Here is an example of what a block of YieldProcessor() macros compiles into:
19: ::YieldProcessor();
000000013FDB18A0 F3 90 pause
20: ::YieldProcessor();
000000013FDB18A2 F3 90 pause
21: ::YieldProcessor();
000000013FDB18A4 F3 90 pause
22: ::YieldProcessor();
000000013FDB18A6 F3 90 pause
23: ::YieldProcessor();
000000013FDB18A8 F3 90 pause
By the way, each pause instruction seems to produce about a 9 cycle delay on the Nehalem architecture, on the average (i.e., 3 ns on a 3.3 GHz CPU).
回答2:
The _mm_pause() intrinsic is fully documented by Intel and supported by all the major x86 compilers portably across OSes. IDK if MS's docs were lacking in the past, or if you just missed it ~7 years go.
#include <immintrin.h>
and use it. (Or for ancient compilers #include <emmintrin.h>
for SSE2).
#include <immintrin.h>
void test() {
_mm_pause();
_mm_pause();
}
compiles to this asm on all 4 of gcc/clang/ICC/MSVC (on the Godbolt compiler explorer):
test(): # @test()
pause
pause
ret
On CPUs without SSE2, it decodes as rep nop
which is just a nop
. Cross-platform implementation of the x86 pause instruction
Gcc even knows this, and still accepts _mm_pause()
when compiling with -mno-sse
. (Normally gcc and clang reject intriniscs for instructions that aren't enabled, unlike MSVC.) Amusingly, gcc even emits rep nop
in its asm output, while the other three emit pause
. They assemble to same machine code, of course.
Pause idles the front-end of that hyperthread for about 5 cycles on Sandybridge-family until Skylake. On Skylake, Intel increased it to ~100 cycles to save more power in spin-wait loops.
See also What is the purpose of the "PAUSE" instruction in x86?.
来源:https://stackoverflow.com/questions/5833527/how-do-you-use-the-pause-assembly-instruction-in-64-bit-c-code