The C++11 standard defines a memory model (1.7, 1.10) which contains memory orderings, which are, roughly, \"sequentially-consistent\", \"acquire\", \"consume\", \"rele
Data dependency ordering was introduced by N2492 with the following rationale:
There are two significant use cases where the current working draft (N2461) does not support scalability near that possible on some existing hardware.
- read access to rarely written concurrent data structures
Rarely written concurrent data structures are quite common, both in operating-system kernels and in server-style applications. Examples include data structures representing outside state (such as routing tables), software configuration (modules currently loaded), hardware configuration (storage device currently in use), and security policies (access control permissions, firewall rules). Read-to-write ratios well in excess of a billion to one are quite common.
- publish-subscribe semantics for pointer-mediated publication
Much communication between threads is pointer-mediated, in which the producer publishes a pointer through which the consumer can access information. Access to that data is possible without full acquire semantics.
In such cases, use of inter-thread data-dependency ordering has resulted in order-of-magnitude speedups and similar improvements in scalability on machines that support inter-thread data-dependency ordering. Such speedups are possible because such machines can avoid the expensive lock acquisitions, atomic instructions, or memory fences that are otherwise required.
emphasis mine
the motivating use case presented there is rcu_dereference()
from the Linux kernel