Is it possible to have a release-sequence from a release store operation to a store in a different thread?

问题

I'm aware that a synchronizes-with relationship will occur between a release store operation in thread 2 and a acquire load operation in thread 1 even if that load operation isn't directly reading the value stored by thread 2, provided that there is a "release sequence" between the release store operation and the store that is actually being read as long as:

The store that is actually being read is in the same thread as the release store operation.
In modification-order there is no store in other threads between the release store operation and the store that is actually being read (read-modify-write operations are allowed though).

However, I don't see any reason why it wouldn't be possible to also have a synchronization when the store that is actually being read is in a different thread, provided that the release store operation is still happens-before the store that is actually being read. Is this explicitly not allowed by the standard? If so, then isn't it possible that the standard is incomplete because it makes sense and all existing hardware will have such synchronizations anyway?

Consider the following example where a, x and y are atomic int's initialized with 0.

Thread 1:

k = y.load(memory_order_acquire);
x.store(1, memory_order_relaxed);

Thread 2:

m = x.load(memory_order_relaxed);
y.store(2, memory_order_release);
a.store(2, memory_order_release);

Thread 3:

n = a.load(memory_order_acquire);
y.store(3, memory_order_relaxed);

where the question is, is it possible that we end up with k = 3, m = 1 and n = 2?

If there is no release-sequence between the store to y in thread 2 and the store to y in thread 3, then there is no synchronizes-with between release store to y in thread 2 and the acquire read of y in thread 1 and therefore it is not necessary that the load of x in thread 2 to happen before the store to x in thread 1, making the desired result of k, m and n possible.

But, if there is a release-sequence between the store to y in thread 2 and the store to y in thread 3 then there is a synchronizes-with between the release store to y in thread 2 and the acquire read of y in thread 1 and therefore the load of x in thread 2 needs to happen-before the store to x in thread 1, making the desired result of k, m and n impossible. Note that if the store/load of a wasn't there and we simply did the relaxed store of value 3 to y at the end of thread 2 then this would be the case (so it would never happen that k=3 and m=1).

In this case the store of value 3 to y happens in thread 3, but there is a release-acquire synchronization using the atomic variable a; hence, if n=2 then there is a happens-before relationship between the release store of value 2 to y and the relaxed store of value 3 to y. Doesn't that mean that there is a release-sequence and a result where k=3, m=1 and n=2 will never happen?

Edit

Note that running the following code snippet:

int main()
{
  atomic_int a = 0;
  atomic_int x = 0;
  atomic_int y = 0;

  {{{
    {
      y.load(memory_order_acquire).readsvalue(3);
      x.store(1, memory_order_relaxed);
    }
  |||
    {
      x.load(memory_order_relaxed).readsvalue(1);
      y.store(2, memory_order_release);
      a.store(2, memory_order_release);
    }
  |||
    {
      a.load(memory_order_acquire).readsvalue(2);
      y.store(3, memory_order_relaxed);
    }
  }}}
}

on http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/ results in 1 consistent execution:

The reason being that there is no rs edge from node g to node j (and therefore no sw/hb edge from j to d).

To compare, when we put the relaxed write simply at the end of thread 2:

int main()
{
  atomic_int a = 0;
  atomic_int x = 0;
  atomic_int y = 0;

  {{{
    {
      y.load(memory_order_acquire).readsvalue(3);
      x.store(1, memory_order_relaxed);
    }
  |||
    {
      x.load(memory_order_relaxed).readsvalue(1);
      y.store(2, memory_order_release);
      y.store(3, memory_order_relaxed);
    }
  }}}
}

Then there is no consistent execution, i.e.:

breaks causality by having node f read from node e while f happens-before node e. The main difference here is that now there is a 'rs' edge from node g to h, which causes a synchronizes-with (sw) edge from node g to node d and therefore a happens-before (hb) edge between the same nodes.

来源：https://stackoverflow.com/questions/48292336/is-it-possible-to-have-a-release-sequence-from-a-release-store-operation-to-a-st

标签

c++

multithreading

concurrency

memory-model

stdatomic