Does hardware memory barrier make visibility of atomic operations faster in addition to providing necessary guarantees?
问题 TL;DR: In a producer-consumer queue does it ever make sense to put an unnecessary (from C++ memory model viewpoint) memory fence, or unnecessarily strong memory order to have better latency at the expense of possibly worse throughput? C++ memory model is executed on the hardware by having some sort of memory fences for stronger memory orders and not having them on weaker memory orders. In particular, if producer does store(memory_order_release) , and consumer observes the stored value with