difference in mfence and asm volatile (“” : : : “memory”)

前端 未结 3 1703
醉酒成梦
醉酒成梦 2021-01-30 09:12

As far as I have understood, mfence is a hardware memory barrier while asm volatile (\"\" : : : \"memory\") is a compiler barrier. But,can asm vo

相关标签:
3条回答
  • 2021-01-30 10:03

    Well, a memory barrier is only needed on architectures that have weak memory ordering. x86 and x64 don't have weak memory ordering. on x86/x64 all stores have a release fence and all loads have an acquire fence. so, you should only really need asm volatile ("" : : : "memory")

    For a good overview of both Intel and AMD as well as references to the relavent manufacturer specs, see http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/

    Generally things like "volatile" are used on a per-field basis where loads and stores to that field are natively atomic. Where loads and stores to a field are already atomic (i.e. the "operation" in question is a load or a store to a single field and thus the entire operation is atomic) the volatile field modifier or memory barriers are not needed on x86/x64. Portable code notwithstanding.

    When it comes to "operations" that are not atomic--e.g. loads or stores to a field that is larger than a native word or loads or stores to multiple fields within an "operation"--a means by which the operation can be viewed as atomic are required regardless of CPU architecture. generally this is done by means of a synchronization primitive like a mutex. Mutexes (the ones I've used) include memory barriers to avoid issues like processor reordering so you don't have to add extra memory barrier instructions. I generally consider not using synchronization primitives a premature optimization; but, the nature of premature optimization is, of course, 97% of the time :)

    Where you don't use a synchronization primitive and you're dealing with a multi-field invariant, memory barriers that ensure the processor does not reorder stores and loads to different memory locations is important.

    Now, in terms of not issuing an "mfence" instruction in asm volatile but using "memory" in the clobber list. From what I've been able to read

    If your assembler instructions access memory in an unpredictable fashion, add `memory' to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.

    When they say "GCC" and don't mention anything about the CPU, this means it applies to only the compiler. The lack of "mfence" means there is no CPU memory barrier. You can verify this by disassembling the resulting binary. If no "mfence" instruction is issued (depending on the target platform) then it's clear the CPU is not being told to issue a memory fence.

    Depending on the platform you're on and what you're trying to do, there maybe something "better" or more clear... portability not withstanding.

    0 讨论(0)
  • 2021-01-30 10:04

    There are two reorderings, one is compiler reordering, the other one is CPU reordering.

    x86/x64 has a relatively strong memory model, but on x86/x64 StoreLoad reordering (later loads passing earlier stores) CAN happen. see http://en.wikipedia.org/wiki/Memory_ordering

    • asm volatile ("" ::: "memory") is just a compiler barrier.
    • asm volatile ("mfence" ::: "memory") is both a compiler barrier and CPU barrier.

    that means, only use a compiler barrier, you can only prevent compiler reordering, but you can not prevent CPU reordering. that means there is no reordering when compiling source code, but reordering can happen in runtime.

    So, it depends your needs, which one to use.

    0 讨论(0)
  • 2021-01-30 10:05
    • asm volatile ("" ::: "memory") is just a compiler barrier.
    • asm volatile ("mfence" ::: "memory") is both a compiler barrier and MFENCE
    • __sync_synchronize() is also a compiler barrier and a full memory barrier.

    so asm volatile ("" ::: "memory") will not prevent CPU reordering independent data instructions per se. As pointed out x86-64 has a strong memory model, but StoreLoad reordering is still possible. If a full memory barrier is needed for your algorithm to work then you neeed __sync_synchronize

    0 讨论(0)
提交回复
热议问题