Consider the following code snippet taken from Herb Sutter\'s talk on atomics:
The smart_ptr class contains a pimpl object called control_block_ptr containing the refere
Boost.Atomic library that emulates std::atomic
provides similar reference counting example and explanation, and it may help your understanding.
Increasing the reference counter can always be done with
memory_order_relaxed
: New references to an object can only be formed from an existing reference, and passing an existing reference from one thread to another must already provide any required synchronization.It is important to enforce any possible access to the object in one thread (through an existing reference) to happen before deleting the object in a different thread. This is achieved by a "release" operation after dropping a reference (any access to the object through this reference must obviously happened before), and an "acquire" operation before deleting the object.
It would be possible to use
memory_order_acq_rel
for the fetch_sub operation, but this results in unneeded "acquire" operations when the reference counter does not yet reach zero and may impose a performance penalty.
From C++ reference on std::memory_order:
memory_order_relaxed: Relaxed operation: there are no synchronization or ordering constraints imposed on other reads or writes, only this operation's atomicity is guaranteed
There is also an example below on that page.
So basically, std::atomic::fetch_add()
is still atomic, even when with std::memory_order_relaxed
, therefore concurrent refs.fetch_add(1, std::memory_order_relaxed)
from 2 different threads will always increment refs
by 2. The point of the memory order is how other non-atomic or std::memory_order_relaxed
atomic operations can be reordered around the current atomic operation with memory order specified.
As this is rather confusing (at least to me) I'm going to partially address one point:
(...) then it may happen that both threads see the value of refs to be N and both write N+1 back to it (...)
According to @AnthonyWilliams in this answer, the above sentence seems to be wrong as:
The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add(). Read-modify-write operations have an additional constraint that they always operate on the "latest" value, so a sequence of ai.fetch_add(1) operations by a series of threads will return a sequence of values with no duplicates or gaps. In the absence of additional constraints, there's still no guarantee which threads will see which values though.
So, given the authority argument, I'd say it's impossible that both threads see the value going from N to N+1.