There is nothing that will guarantee that: everything is about ordering. Even memory_order_seq_cst
just guarantees that things happen in a single total order. In theory, the compiler/library/cpu could schedule every load from cancel_store
at the end of the program.
There is a general statement in 29.3p13 that
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
But there is no specification on what constitutes a "reasonable amount of time".
So: memory_order_relaxed
should be just fine, but memory_order_seq_cst
may work better on some platforms, as the cache line may be reloaded sooner.