I\'m watching this video by Herb Sutter on GPGPU and the new C++ AMP library. He is talking about memory models and mentions Weak Memory Models and then
Google turns up some interesting results (mostly science papers) on memory models, but can someone explain what is a Weak Memory Model and what is a Strong Memory Model and their relation to concurrency?
A strong memory model is one where, from the point of view of other cores, reads and writes appear to happen as they appear in the program and, in particular, in the order in which they appear in the program. This is known as sequential consistency.
A weak memory model is one where memory executions may be changed by the CPU, e.g. reordered. All practical CPU architectures allow instructions to be reordered.
Note that Herb Sutter uses "strong memory model" to mean one where atomic intrinsics are not reordered. This is not the commonly accepted definition.
Yes, you are right - the difference between Weak and Strong memory models is a difference in what optimizations are available (order of reads/write and related fences).
You can specify a memory model by starting with a sequentially consistent model (the most restrictive, or strongest model), and then specify how reads and writes from a single thread can be introduced, removed, or moved with respect to one another
In this model (sequentially consistent) the memory is independent of any of the processors (threads) that use it. The memory is connected to each of the threads by a controller that feeds read and write requests from each thread. The reads and writes from a single thread reach memory in exactly the order specified by the thread, but they might be interleaved with reads and writes from other threads in an unspecified way
Understand the Impact of Low-Lock Techniques in Multithreaded Apps
However there's no exact bound between strong and weak memory models, unless you consider sequentilly consistent model vs others. Some of them are just stronger/weaker and therefore more open to optimizations by reordering than others. For example, memory model in .NET 2.0 for x86 allows a bit more optimizations that the verison in .NET 1.1 so it can be considered as a weaker model.
The two terms aren't clearly defined, and it's not a black/white thing.
Memory models can be extremely weak, extremely strong, or anywhere in between.
It basically refers to the guarantees offered about concurrent memory accesses.
Naively, you would expect a write made on one thread, to be immediately visible to all other threads. And you would expect events to appear in the same order on all threads as well.
But in a weaker memory model, neither of those may hold.
Sequential consistency is the term for a memory model which guarantees that events are seen in the same order across all threads. So a memory model which ensures sequential consistency is pretty strong.
A weaker guarantee is causal consistency: the guarantee that events are observed after the events they depend on.
In other words, if you first write a value x
to some address A
, and then write a second value y
to the same address, then no thread will ever read the value y
after reading the x
value. Because the two writes are to the same address, it would violate causal consistency if not all threads observed the same order.
But this says nothing about what should happen to unrelated events. The result of writing a third value to a different memory address could be observed at absolutely any time by other threads (so different threads may observe events in a different order, unlike under sequential consistency)
There are plenty other such levels of "consistency", some stronger, some weaker, and offering all sorts of subtle guarantees about what you can rely on.
Fundamentally, a stronger memory model is going to offer more guarantees about the order in which events are observed, and will normally guarantee behavior closer to what you'd intuitively expect.
But a weaker model allows more room for optimization, and especially, it scales better with more cores (because less synchronization is required)
Sequential consistency is basically free on a single-core CPU, is doable on a quad-core, but would be prohibitively expensive on a 32-core system, or a system with 4 physical CPUs. Or a shared-memory system between multiple physical machines.
The more cores you have, and the further apart they are, the harder it is to ensure that they all observe events in the same order. So compromises are made, and you settle for a weaker memory model which makes looser guarantees.
In terms of concurrency, a memory model specifies the constraints on data accesses, and the conditions under which data written by one thread/core/processor becomes visible to another.
The terms weak and strong are somewhat ambiguous, but the basic premise is that a strong memory model places a lot of constraints on the hardware to ensure that writes by one thread/core/processor are visible to other threads/cores/processors in clearly-defined orders, whilst allowing the programmer maximum freedom of data access.
On the other hand, a weak model places very little constraints on the hardware, but instead places the responsibility of ensuring visibility in the hands of the programmer.
The strongest memory model is Sequential Consistency: all operations to all data by all processors form a single total order agreed on by all processors, which is consistent with the order of operations on each processor individually. This is essentially an interleaving of the operations of each processor.
The weakest memory model will not impose any restrictions on the order that processors see each other's writes. Different processors in the same system may see writes in different orders, and some processors may use "stale" data from their own cache for a long time after a write to the same memory address by another processor. Sometimes, whole cache lines are treated as a single unit, so a write to one variable on a cache line will cause writes from other processors to other variables on that cache line that are not yet visible to the first processor to be effectively discarded, as the stale values are written over the top when it eventually writes the cache line to memory. Under such a scheme, extreme care must be taken to ensure that data is transferred to other processors in the correct order, using explicit synchronization instructions.
For example, the Intel x86 memory model is generally considered to be on the stronger end, as there are strict rules about the order in which writes become visible to other processors, whereas the DEC Alpha and ARM processors are generally considered to have weak memory models, as writes from one processor are only required to be visible to other processors in a particular order if you explicitly put ordering instructions (memory fences or barriers) in your code.
Some systems have memory that is only accessible by particular processors. Transferring data between these processors therefore requires explicit data transfer instructions. This is the case with the Cell processors, and is often the case with GPUs as well. This can be viewed as an extreme of a weak memory model --- data is only visible to other processors if you explicitly invoke the data transfer.
Programming languages usually impose their own memory models on top of whatever is provided by the underlying processors. For example, C++0x specifies a complete set of ordering constraints ranging from completely relaxed to full sequential consistency, so you can specify in code what you require. On the other hand, Java has a very specific set of ordering constraints that must be adhered to and cannot be varied. In both cases the compiler must translate the desired constraints into the relevant instructions for the underlying processor, which may be quite involved if you request sequential consistency on a weakly ordered machine.