问题
I don't understand, why will be problems without release sequence
, if we have 2 threads in the example below. We have only 2 operations on the atomic variable count
. count
is decremented sequently as shown in the output.
From C++ Concurrency in Action by Antony Williams:
I mentioned that you could get a
synchronizes-with relationship
between astore
to an atomic variable and aload
of that atomic variable from another thread, even when there’s a sequence ofread-modify-write
operations between thestore
and theload
, provided all the operations are suitably tagged. If the store is tagged withmemory_order_release
,memory_order_acq_rel
, ormemory_order_seq_cst
, and the load is tagged withmemory_order_consume
,memory_order_acquire
, ormemory_order_seq_cst
, and each operation in the chain loads the value written by the previous operation, then the chain of operations constitutes a release sequence and the initial storesynchronizes-with
(formemory_order_acquire
ormemory_order_seq_cst
) or isdependency-ordered-before
(formemory_order_consume
) the final load. Any atomic read-modify-write operations in the chain can have any memory ordering (evenmemory_order_relaxed
).To see what this means (release sequence) and why it’s important, consider an
atomic<int>
being used as a count of the number of items in a shared queue, as in the following listing.One way to handle things would be to have the thread that’s producingthe data store the items in a shared buffer and then do
count.store(number_of_items, memory_order_release)
#1 to let the other threads know that data is available. The threads consuming the queue items might then docount.fetch_sub(1,memory_ order_acquire)
#2 to claim an item from the queue, prior to actually reading the shared buffer #4. Once the count becomes zero, there are no more items, and the thread must wait #3.
#include <atomic>
#include <thread>
#include <vector>
#include <iostream>
#include <mutex>
std::vector<int> queue_data;
std::atomic<int> count;
std::mutex m;
void process(int i)
{
std::lock_guard<std::mutex> lock(m);
std::cout << "id " << std::this_thread::get_id() << ": " << i << std::endl;
}
void populate_queue()
{
unsigned const number_of_items = 20;
queue_data.clear();
for (unsigned i = 0;i<number_of_items;++i)
{
queue_data.push_back(i);
}
count.store(number_of_items, std::memory_order_release); //#1 The initial store
}
void consume_queue_items()
{
while (true)
{
int item_index;
if ((item_index = count.fetch_sub(1, std::memory_order_acquire)) <= 0) //#2 An RMW operation
{
std::this_thread::sleep_for(std::chrono::milliseconds(500)); //#3
continue;
}
process(queue_data[item_index - 1]); //#4 Reading queue_data is safe
}
}
int main()
{
std::thread a(populate_queue);
std::thread b(consume_queue_items);
std::thread c(consume_queue_items);
a.join();
b.join();
c.join();
}
output (VS2015):
id 6836: 19
id 6836: 18
id 6836: 17
id 6836: 16
id 6836: 14
id 6836: 13
id 6836: 12
id 6836: 11
id 6836: 10
id 6836: 9
id 6836: 8
id 13740: 15
id 13740: 6
id 13740: 5
id 13740: 4
id 13740: 3
id 13740: 2
id 13740: 1
id 13740: 0
id 6836: 7
If there’s one consumer thread, this is fine; the
fetch_sub()
is a read, withmemory_order_acquire
semantics, and the store hadmemory_order_release
semantics, so the store synchronizes-with the load and the thread can read the item from the buffer.If there are two threads reading, the second
fetch_sub()
will see the value written by the first and not the value written by the store. Without the rule about therelease sequence
, this second thread wouldn’t have ahappens-before relationship
with the first thread, and it wouldn’t be safe to read the shared buffer unless the firstfetch_sub()
also hadmemory_order_release
semantics, which would introduce unnecessary synchronization between the two consumer threads. Without therelease sequence
rule ormemory_order_release
on thefetch_sub
operations, there would be nothing to require that the stores to thequeue_data
were visible to the second consumer, and you would have a data race.
What does he mean? That both threads should see the value of count
is 20
? But in my output count
is sequently decremented in threads.
Thankfully, the first
fetch_sub()
does participate in the release sequence, and so thestore()
synchronizes-with the secondfetch_sub()
. There’s still no synchronizes-with relationship between the two consumer threads. This is shown in figure 5.7. The dotted lines in figure 5.7 show the release sequence, and the solid lines show thehappens-before relationships
回答1:
What does he mean? That both threads should see the value of count is 20? But in my output count is sequently decremented in threads.
No he doesn't. All modification to count
are atomic, so both reader threads would always see different values for it in the given code.
He's talking about the implications of the release sequence rule, namely that when a given thread performs a release
store, other multiple threads that then perform acquire
loads of the same location form a release sequence, in which each subsequent acquire
load has a happens-before relationship with the storing thread (i.e. the completion of the store happens-before the load). This means that the load operation in the reader thread is a synchronisation point with the writer thread, and all memory operations in the writer prior to the store must complete and be visible in the reader when its corresponding load completes.
He's saying that without this rule, only the first thread would be thus synchronised to the writer. The second thread would therefore have a data race in accessing queue
(note: not count
, which is protected anyway by atomic access). Theoretically, memory operations on data occurring before the store
on count
could be seen by reader thread number 2 only after its own load operation on count
. The release sequence rule assures that this will not happen.
In summary: the release sequence rules assures multiple threads can synchronise their loads on a single store. The synchronisation in question is that of memory accesses to data other than the actual atomic variable being synchronised on (which is guaranteed to be synchronised anyway due to being atomic).
Note to add here: for the most part these kind of issues are only of concern on CPU architectures that are relaxed about reordering their memory operations. The Intel architecture is not one of them: it is strongly-ordered and has only a few very specific circumstances in which memory operations can ever be reordered. These kind of nuances are mostly only relevant when talking about other architectures, such as ARM and PowerPC.
回答2:
it means that the initial store is synchronized-with the final load even if the value read by the final load isn't directly the same value stored at beginning, but it is the value modified by one of the atomic instruction which could race into. A simpler example, assuming there are three threads racing which executes these instruction (assume x initialized to 0 before the race)
// Thread 1:
A;
x.store(2, memory_order_release);
// Thread 2:
B;
int n = x.fetch_add(1, memory_order_relaxed);
C;
// Thread 3:
int m = x.load(memory_order_acquire);
D;
What are the possible values read for n
and m
according to possible results of the race? And what are the guarantees that we have on the ordering of instructions A
, B
, C
, and D
based on what we read on m
and n
?
For n
we have two cases, either 0
or 2
. For m
we could read 0
, 1
, 2
, and 3
.
There are six valid combinations of the two. Let's see each case:
m = 0, n = 0
. We don't have any synchronizes-with relationship, thus we can't infer any happens-before relationship except for the obviousB
happens-beforeC
m = 0, n = 2
. Even though thefetch_add
operation read the value written by thestore
, since thefetch_add
has arelaxed
memory ordering there is no synchronizes-with relationship between the two instruction. We can't say thatA
happens-beforeC
m = 1, n = 0
. Similarly as before, sincefetch_add
don't have arelease
semantic we can't infer a synchronizes-with relationship between thefetch_add
and theload
operation, hence we don't know whetherB
happens-beforeD
m = 2, n = 0
. The value we read with theacquire
semanticload
has been written with arelease
semanticstore
. We are guaranteed that thestore
synchronizes-with theload
, henceA
happens-beforeD
m = 2, n = 2
. Same as above, thestore
synchronizes-with theload, hence
A_happens-before_
D. As usual, the fact that the value read from
fetch_addis the same as the one
store`d from thread 1 do not imply any synchronization relationship.m = 3, n = 2
. In this case the data read by theload
has been written by thefetch_add
, and the data read by thefetch_add
has been written by thestore
. However becausefetch_add
hasrelaxed
semantic, no synchronization can be assumed betweenstore
andfetch_add
and betweenfetch_add
andload
. Apparently, in this case no synchronization can be assumed, same as the casem = 0, n = 0
. Here is where the release sequence concept comes in handy: therelease
semanticstore
in thread 1 will synchronize-with theacquire
semanticload
in thread 3 as long as the value that is being read has been written in therelease sequence
, which includes- all the stores performed later in the same thread as the release operation
- all the atomic read-modify-write operation which read a value from the same release sequence.
In this case since
fetch_add
is an atomic read-modify-write operation we know that thestore
in thread 1 synchronizes-with theload
in thread 3, and thusA
happens-beforeD
. We still can't say anything about the ordering ofB
andC
though.
In your case you have this pseoudocode, assuming number_of_items = 2
:
// Thread 1
Item[0] = ...;
Item[1] = ...;
count.store(2,memory_order_release);
// Thread 2
int i2 = 0;
while (i2 = count.fetch_sub(1,memory_order_acquire) <= 0 ) sleep();
auto x2 = Item[i2-1];
process(x2);
// Thread 3
int i3 = 0;
while (i3 = count.fetch_sub(1,memory_order_acquire) <= 0 ) sleep();
auto x3 = Item[i3-1];
process(x3);
Let's assume that the first positive value read into i2
is 2
, and thus the first positive value read into i3
is 1
. Since the value read from Thread 2 has been written from the store in Thread 1, the store synchronizes-with the load, and we know that Item[1] = ...;
from Thread 1 happens-before auto x2 = Item[1];
in Thread 2. However the value 1
read from Thread 3 has been written by Thread 2, with fetch_sub
which has no release
semantic. The fetch_sub
from Thread 2 thus does not synchronizes-with the fetch_sub
from Thread 3, however since the fetch_sub
from Thread 2 is part of the release chain of the store
in Thread 1, the store
in Thread 1 also synchronizes-with the fetch_sub
in Thread 3, from which we know that Item[0] = ...;
happens-before auto x3 = Item[0];
回答3:
fetch_sub
is read-modify-write action. it atomically reads the value from the memory address, decrement it by the argument provided and then writes it back to the memory address . it all happens atomically.
now, every atomic action reads and writes directly to the memory address. the CPU does not rely on a value cachced in the registers or the cache-lines for performance gain. it reads and writes to the memory address directly and prevent othr CPU's to do so in that time.
what "plain" (==relaxed) atomicity does not provide is reordering. both the compiler and the CPU scramble reads and writes in order to speed up the execution of the program.
look at the example below:
atomic integer i
regular integer j
Thread A:
i <- 5
//do something else
i -> j
//make some decisions regarding j value.
Thread B:
i++
if no memory order is supllied the compiler and the CPU are allowed to transform the code to
Thread A:
i -> j
i <- 5
//do something else
//make some decisions regarding j value.
Thread B:
i++
Which is of-course not what we wanted. the decision making is wrong.
what we need is memory reordering.
memory order acquire: don't scramble memory accesses before
memory order release: don't scramble memory accesses after
going back to the question:
fetch_sub
is both reading a value and writing a value. by specifying a memory order acquire
we say "I only care about the order of actions the happened before the reading"
by specifying memory order release
we say "I only care about the order of actions happened after the writing.
But you do care about memory access before and after!
if you only have one consumer thread, than the sub_fetch
does not effect anyone, because the producer anyway uses plain store
and the affects of fetch_sub
are only visible to the thread which invoked fetch_sub
. and in this case, you only care about the reading - the reading gives you the current and updated index. what happens after you store the updated index (lets say x-1
) is not that important.
but since there are two threads which read and write to counter
it is important that thread A will be aware that thread B wrote a new value to the counter and Thread B is aware that Thread A is about the read the value of counter. also vice versa- Thread B must be aware that Thread A wrote a new value to counter
and Thread A must be aware that Thread B is about to read a value from counter
you need both guarantees - every thread states that it is about to both read and write to the shared counter. the memory order you need is std::memory_order_acquire_release
.
But the example is tricky. the producer thread simply stores a new value in counter
regardless of the value which was there before. if the producer thread was to incremenet the counter each time it pushes new item - you had to use std::memory_order_acquire_release
in both the producer and the consumer threads even if you had one consumer
回答4:
i stumbled over the exact same question like you did. i thought i got the understanding right and then he comes in with this example and only uses std::memory_order_aquire. it was difficult to find any good information on this, but finally i found some helpful sources. the main information i was not aware of was the simple fact, that read-modify-write operations ALWAYS work on the newest/latest value, no matter what memory order given (even std::memory_order_relaxed). this ensures, that you wont have the same index two times in the example. still the ordering of operations can mix up (so you dont know which fetch_sub will happens before the other).
this is an answer of anthony williams himself stating that read-modify-write operations always work on the latest value: Concurrency: Atomic and volatile in C++11 memory model
additionally, someone asked about the fetch_sub in combination with the shared_ptr ref count. here anthony williams responded too and brings clarity into the situation with the reordering of the fetch_sub: https://groups.google.com/a/isocpp.org/forum/#!topic/std-discussion/OHv-oNSuJuk
来源:https://stackoverflow.com/questions/38565650/what-does-release-sequence-mean