Can I force cache coherency on a multicore x86 CPU?

后端 未结 9 1088
一个人的身影
一个人的身影 2020-11-29 18:43

The other week, I wrote a little thread class and a one-way message pipe to allow communication between threads (two pipes per thread, obviously, for bidirectional communica

相关标签:
9条回答
  • 2020-11-29 18:48

    There's a series of articles explaining modern memory architectures here, including Intel Core2 caches and many more modern architecture topics.

    Articles are very readable and well illustrated. Enjoy !

    0 讨论(0)
  • 2020-11-29 18:49

    volatile only forces your code to re-read the value, it cannot control where the value is read from. If the value was recently read by your code then it will probably be in cache, in which case volatile will force it to be re-read from cache, NOT from memory.

    There are not a lot of cache coherency instructions in x86. There are prefetch instructions like prefetchnta, but that doesn't affect the memory-ordering semantics. It used to be implemented by bringing the value to L1 cache without polluting L2, but things are more complicated for modern Intel designs with a large shared inclusive L3 cache.

    x86 CPUs use a variation on the MESI protocol (MESIF for Intel, MOESI for AMD) to keep their caches coherent with each other (including the private L1 caches of different cores). A core that wants to write a cache line has to force other cores to invalidate their copy of it before it can change its own copy from Shared to Modified state.


    You don't need any fence instructions (like MFENCE) to produce data in one thread and consume it in another on x86, because x86 loads/stores have acquire/release semantics built-in. You do need MFENCE (full barrier) to get sequential consistency. (A previous version of this answer suggested that clflush was needed, which is incorrect).

    You do need to prevent compile-time reordering, because C++'s memory model is weakly-ordered. volatile is an old, bad way to do this; C++11 std::atomic is a much better way to write lock-free code.

    0 讨论(0)
  • 2020-11-29 18:54

    Volatile won't do it. In C++, volatile only affects what compiler optimizations such as storing a variable in a register instead of memory, or removing it entirely.

    0 讨论(0)
  • 2020-11-29 18:55

    Cache coherence is guaranteed between cores due to the MESI protocol employed by x86 processors. You only need to worry about memory coherence when dealing with external hardware which may access memory while data is still siting on cores' caches. Doesn't look like it's your case here, though, since the text suggests you're programming in userland.

    0 讨论(0)
  • 2020-11-29 18:59

    You don't need to worry about cache coherency. The hardware will take care of that. What you may need to worry about is performance issues due to that cache coherency.

    If core#1 writes to a variable, that invalidates all other copies of the cache line in other cores (because it has to get exclusive ownership of the cache line before committing the store). When core#2 reads that same variable, it will miss in cache (unless core#1 has already written it back as far as a shared level of cache).

    Since an entire cache line (64 bytes) has to be read from memory (or written back to shared cache and then read by core#2), it will have some performance cost. In this case, it's unavoidable. This is the desired behavior.


    The problem is that when you have multiple variables in the same cache line, the processor might spend extra time keeping the caches in sync even if the cores are reading/writing different variables within the same cache line.

    That cost can be avoided by making sure those variables are not in the same cache line. This effect is known as False Sharing since you are forcing the processors to synchronize the values of objects which are not actually shared between threads.

    0 讨论(0)
  • 2020-11-29 19:04

    There are several sub-questions in your question so I'll answer them to the best of my knowledge.

    1. There currently is no portable way of implementing lock-free interactions in C++. The C++0x proposal solves this by introducing the atomics library.
    2. Volatile is not guaranteed to provide atomicity on a multicore and its implementation is vendor-specific.
    3. On the x86, you don't need to do anything special, except declare shared variables as volatile to prevent some compiler optimizations that may break multithreaded code. Volatile tells the compiler not to cache values.
    4. There are some algorithms (Dekker, for instance) that won't work even on an x86 with volatile variables.
    5. Unless you know for sure that passing access to data between threads is a major performance bottleneck in your program, stay away from lock-free solutions. Use passing data by value or locks.
    0 讨论(0)
提交回复
热议问题