MFENCE/SFENCE/etc “serialize memory but not instruction execution”?

问题

Intel's System Programming Guide, section 8.3, states regarding MFENCE/SFENCE/LFENCE:

"The following instructions are memory-ordering instructions, not serializing instructions. These drain the data memory subsystem. They do not serialize the instruction execution stream."

I'm trying to figure out why this matters. In multi-threaded code, writes/reads to memory are exactly what need to happen in a well-defined order. Of course, the order which I/O happens in might matter, but I/O instructions are "serializing instructions" anyways. It should be possible for the CPU to reorder instructions which (for example) do arithmetic in registers as it likes; I don't think there is any reason why you ever want to "serialize" such operations.

Is there any case where a fully serializing instruction is really needed, and MFENCE's serialization of only loads and stores is "not enough"?

回答1:

Is there any case where a fully serializing instruction is really needed, and MFENCE's serialization of only loads and stores is "not enough"?

Benchmarking and code profiling.

If you're trying to measure the performance of a code sequence, particularly if it's very short, it can be important to ensure that parts of the benchmarked operations aren't being executed outside the timed sequence. For instance, if your code looks something like this pseudocode:

start = RDTSC()
do some stuff
end = RDTSC()
cycles = end - start

It's important to make sure that none of the code in the middle is executed before the first RDTSC, or after the second one.

Happily, there's a perfect instruction available for this: CPUID is fully serializing.

回答2:

Section 8.3 of the Intel manual contains a full list of instructions that are considered to be fully serializing (See also: How many memory barriers instructions does an x86 CPU have?):

Privileged serializing instructions — INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control register, with the exception of MOV CR8 3 ), MOV (to debug register), WBINVD, and WRMSR 4.

Non-privileged serializing instructions — CPUID, IRET, and RSM.

I think that all of these instructions except CPUID are serializing because the semantics of the instruction requires it to be like that. For example, if WBINV is not serializing, then it might be reordered with other earlier or later operations that access memory and it wouldn't be clear what the state of the cache hierarchy is when the instruction retires.

The CPUID instruction was first introduced in the Pentium processor, which is a speculative, in-order processor. One typical use of this instruction is to check whether a particular feature is supported on the current processor and then jump to a piece of code that uses that feature if supported (such as executing an instruction). I'm not sure what complication may arise if CPUID had not been serializing. For example, if it is being used to check whether the processor supports some particular instruction and the branch predictor incorrectly predicted that the path that contains that instruction is taken, then the decoders will treat it as an invalid instruction. This situation can be handled using the same mechanism used for branch mispredictions and invalid instructions.

The RDTSC instruction was also first introduced in the Pentium processor. However, no where in the Pentium software developer manual does it mention that you need to use a serializing instruction with RDTSC. This makes sense because the processor was in-order and 2-wide and so RDTSC could only overlap with a single instruction that precedes or succeeds it. In the Pentium Pro manual, it does mention that you need to use serializing instruction due to out-of-order execution. The important point here is that CPUID was serializing even on a processor on which we didn't it need for RDTSC. This means that the original reason why CPUID is serializing is something else. The Pentium manual does mention two situations where it is necessary to use a serializing instruction.

15.4. ORDERING OF I/O

Using memory-mapped I/O, therefore, creates the possibility that an I/O read might be performed before the memory write of a previous instruction. To eliminate this possibility on the Intel486 CPU, use an I/O instruction for the read. To eliminate this possibility on the Pentium processor, insert one of the serializing instructions, such as CPUID, between operations.

18.2.3. Self-Modifying Code

Because the linear address of the write is checked against the linear address of the instructions that have been prefetched, special care must be taken for self-modifying code to work correctly when the physical addresses of the instruction and the written data are the same, but the linear addresses differ. In such cases, it is necessary to execute a serializing operation after the write and before executing the modified instruction.

All of the serializing instructions except CPUID are not suitable to be used for general-purpose serialization because they are either privileged, can significantly impact performance, change the control the flow of the program, or the change segment descriptor tables. CPUID is also not perfect because it changes the value of some architectural registers. So I think Intel had a choice of either introducing a new general-purpose serializing instruction that does nothing but serialize the pipeline or make CPUID a serializing instruction. It could also be the case that CPUID itself requires serialization for some reason. Either way, it seems that they have decided to make CPUID play the role of a general-purpose serializing instruction. This makes sense considering the fact that CPUID modifies four registers has negligible impact on performance compared to the impact of serialization.

Later, in the Pentium Pro processor, Intel recommended using CPUID with RDTSC in the manual to make accurate measurements (See: Get CPU cycle count?).

来源：https://stackoverflow.com/questions/26683097/mfence-sfence-etc-serialize-memory-but-not-instruction-execution

标签

assembly

x86