Do locked instructions provide a barrier between weakly-ordered accesses?

后端 未结 2 1348
旧时难觅i
旧时难觅i 2021-01-19 00:50

On x86, lock-prefixed instructions such as lock cmpxchg provide barrier semantics in addition to their atomic operation: for normal memory access o

2条回答
  •  深忆病人
    2021-01-19 01:41

    On all 64-bit AMD processors, MFENCE is a fully serializing instruction and the Lock-prefixed instructions are not. However, both serialize all memory accesses according to the AMD manual V2 7.4.2:

    All previous loads and stores complete to memory or I/O space before a memory access for an I/O, locked or serializing instruction is issued.

    All loads and stores associated with the I/O and locked instructions complete to memory (no buffered stores) before a load or store from a subsequent instruction is issued.

    There are no exceptions or erratum related to the serialization properties of these instructions.

    It's clear from the Intel manual and documents that both serialize all stores with no exceptions or related erratum. MFENCE also serializes all loads, with one errata documented for most processors based on Skylake, Kaby Lake, and Coffee Lake microarchitectures, which states that MOVNTDQA from WC memory may passs earlier MFENCE instructions. In addition, many processors based on the Nehalem, Sandy Bridge, Ivy Bridge, Haswell, Broadwell, Skylake, Kaby Lake, Coffee Lake, and Silvermont microarchitectures have an errata that says that MOVNTDQA from WC memory may passs earlier locked instructions. Processors based on the Core, Westmere, Sunny Cove, and Goldmont microarchitectures don't have this errata.

    The quote from Necrolis's answer says that the lock prefix may not serialize load operations that reference weakly ordered memory types on the Pentium 4 processors. My understanding is that this looks like a bug in the Pentium 4 processors and it doesn't apply to any other processors. Although it's worth noting that it's not documented in the spec update documents of the Pentium 4 processors.


    @PeterCordes's experiments show that, on Skylake, locking instructions don't seem to block ALU instructions from being executed out-of-order while mfence does serialize ALU instructions (potentially behaving identically to lfence + a store-buffer flush like a locked instruction). However, I think this is an implementation detail.

提交回复
热议问题