I have seen the related question including here and here, but it seems that the only instruction ever mentioned for serializing rdtsc
is cpuid
.
Unfortunately, cpuid
takes roughly 1000 cycles on my system, so I am wondering if anyone knows of a cheaper (fewer cycles and no read or write to memory) serializing instruction?
I looked at iret
, but that seems to change control flow, which is also undesirable.
I have actually looked at the whitespaper linked in Alex's answer about rstscp
, but it says:
The RDTSCP instruction waits until all previous instructions have been executed before reading the counter. However, subsequent instructions may begin execution before the read operation is performed.
That second point seems to be make it less than ideal.
Have you looked at the rdtscp
instruction? This is the read serialized version of rdtsc
.
For benchmarking I would recommend to read this whitepaper. It provides a couple of best practices for measuring clock ticks.
Alex(Intel)
The answer is apparently not. The Intel Manual, Volume 3a lists only 3 non-privileged serializing instructions (cpuid
, iret
, and rsm
), and the latter two seem to have control-flow side-effects.
Well,I guess this is helpfull:lfence.Ref this 《64-ia-32-architectures-software-developer-manual》 Vol.2B 4-301
来源:https://stackoverflow.com/questions/23280697/is-there-a-cheaper-serializing-instruction-than-cpuid