Is mfence for rdtsc necessary on x86_64 platform?

后端未结

关注

 2  1126

无人共我 2021-01-07 01:38

unsigned int lo = 0;
unsigned int hi = 0;
__asm__ __volatile__ (
    \"mfence;rdtsc\" : \"=a\"(lo), \"=d\"(hi) : : \"memory\"
);

mfence

2条回答

北海茫月 (楼主)

2021-01-07 02:00

mfence is there to force serialization in CPU before rdtsc.

Usually you will find cpuid there (which is also serializing instruction).

Quote from Intel manuals about using rdtsc will make it clearer

Starting with the Intel Pentium processor, most Intel CPUs support out-of-order execution of the code. The purpose is to optimize the penalties due to the different instruction latencies. Unfortunately this feature does not guarantee that the temporal sequence of the single compiled C instructions will respect the sequence of the instruction themselves as written in the source C file. When we call the RDTSC instruction, we pretend that that instruction will be executed exactly at the beginning and at the end of code being measured (i.e., we don’t want to measure compiled code executed outside of the RDTSC calls or executed in between the calls themselves). The solution is to call a serializing instruction before calling the RDTSC one. A serializing instruction is an instruction that forces the CPU to complete every preceding instruction of the C code before continuing the program execution. By doing so we guarantee that only the code that is under measurement will be executed in between the RDTSC calls and that no part of that code will be executed outside the calls.

TL;DR version - without serializing instruction before rdtsc you have no idea when that instruction started to execute making measurements possibly incorrect.

HINT - use rdtscp when possible.

Based on my test, cpu reorder is not found.

Still no guarantee that it may happen - that's why original code had "memory" to indicate possible memory clobber preventing compiler from reordering it.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...