Anything in std::atomic is wait-free?

狂风中的少年 提交于 2020-08-24 08:09:29

问题


If T is a C++ fundamental type, and if std::atomic<T>::is_lock_free() returns true, then is there anything in std::atomic<T> that is wait-free (not just lock-free)? Like, load, store, fetch_add, fetch_sub, compare_exchange_weak, and compare_exchange_strong.

Can you also answer based on what is specified in the C++ Standard, and what is implemented in Clang and/or GCC (your version of choice).

My favorite definitions for lock-free and wait-free are taken from C++ Concurrency in Action (available for free). An algorithm is lock-free if it satisfies the first condition bellow, and it is wait-free if it satisfies both conditions below:

  1. If one of the threads accessing the data structure is suspended by the scheduler midway through its operation, the other threads must still be able to complete their operations without waiting for the suspended thread.
  2. Every thread accessing the data structure can complete its operation within a bounded number of steps, regardless of the behavior of other threads.

回答1:


There exist universally accepted definitions of lock-freedom and wait-freedom, and the definitions provided in your question are consistent with those. And I would strongly assume that the C++ standard committee certainly sticks to definitions that are universally accepted in the scientific community.

In general, publications on lock-free/wait-free algorithms assume that CPU instructions are wait-free. Instead, the arguments about progress guarantees focus on the software algorithm.

Based on this assumption I would argue that any std::atomic method that can be translated to a single atomic instruction for some architecture is wait-free on that specific architecture. Whether such a translation is possible sometimes depends on how the method is used though. Take for example fetch_or. On x86 this can be translated to lock or, but only if you do not use its return value, because this instruction does not provide the original value. If you use the return value, then the compiler will create a CAS-loop, which is lock-free, but not wait-free. (And the same goes for fetch_and/fetch_xor.)

So which methods are actually wait-free depends not only on the compiler, but mostly on the target architecture.

Whether it is technically correct to assume that a single instruction is actually wait-free or not is a rather philosophical one IMHO. True, it might not be guaranteed that an instruction finishes execution within a bounded number of "steps" (whatever such a step might be), but the machine instruction is still the smallest unit on the lowest level that we can see and control. Actually, if you cannot assume that a single instruction is wait-free, then strictly speaking it is not possible to run any real-time code on that architecture, because real-time also requires strict bounds on time/the number of steps.


This is what the C++17 standard states in [intro.progress]:

Executions of atomic functions that are either defined to be lock-free (32.8) or indicated as lock-free (32.5) are lock-free executions.

  • If there is only one thread that is not blocked (3.6) in a standard library function, a lock-free execution in that thread shall complete. [ Note: Concurrently executing threads may prevent progress of a lock-free execution. For example, this situation can occur with load-locked store-conditional implementations. This property is sometimes termed obstruction-free. — end note ]
  • When one or more lock-free executions run concurrently, at least one should complete. [ Note: It is difficult for some implementations to provide absolute guarantees to this effect, since repeated and particularly inopportune interference from other threads may prevent forward progress, e.g., by repeatedly stealing a cache line for unrelated purposes between load-locked and store-conditional instructions. Implementations should ensure that such effects cannot indefinitely delay progress under expected operating conditions, and that such anomalies can therefore safely be ignored by programmers. Outside this document, this property is sometimes termed lock-free. — end note ]

Francesco Menzani correctly pointed out that my original answer was a bit imprecise, since there exist two stronger subtypes of wait-freedom.

  • wait-free - A method is wait-free if it guarantees that every call finishes its execution in a finite number of steps, i.e., it is not possible to determine an upper bound, but it must still be guaranteed that the number of steps is finite.
  • wait-free bounded - A method is wait-free bounded if it guarantees that every call finishes its execution in a bounded number of steps, where this bound may depend on the number of threads.
  • wait-free population oblivious - A method is wait-free population oblivious if it guarantees that every call finishes its execution in a bounded number of steps, and this bound does not depend on the number of threads.

So strictly speaking, the definition in the question is consistent with the definition of wait-free bounded.

In practice, most wait-free algorithms are actually wait-free bounded or even wait-free population oblivious, i.e., it is possible to determine an upper bound on the number of steps.




回答2:


Since there are many definitions of wait-freedom1 and people choose different ones, I think that a precise definition is paramount, and a distinction between its specializations is necessary and useful.

These are the universally accepted definitions of wait-freedom and its specializations:

  • wait-free: All threads will make progress in a finite number of steps.

  • wait-free bounded: All threads will make progress in a bounded number of steps, which may depend on the number of threads.

  • wait-free population-oblivious3: All threads will make progress in a fixed number of steps, that does not depend on the number of threads.

Overall, the C++ standard makes no distinction between lock-free and wait-free (see this other answer). It always gives guarantees no stronger than lock-free.

When std::atomic<T>::is_lock_free() returns true, instead of mutexes the implementation utilizes atomic instructions possibly with CAS loops or LL/SC loops.
Atomic instructions are wait-free. CAS and LL/SC loops are lock-free.

How a method is implemented depends on many factors, including its usage, the compiler, the target architecture and its version. For example:

  • As someone says, on x86 gcc, fetch_add() for std::atomic<double> uses a CAS loop (lock cmpxchg), while for std::atomic<int> uses lock add or lock xadd.
  • As someone else says, on architectures featuring LL/SC instructions, fetch_add() uses a LL/SC loop if no better instructions are available. For example, this is not the case on ARM versions 8.1 and above, where ldaddal is used for non-relaxed std::atomic<int> and ldadd is used if relaxed.
  • As stated in this other answer, on x86 gcc fetch_or() uses lock or if the return value is not used, otherwise it uses a CAS loop (lock cmpxchg).

As explained in this answer of mine:

  • The store() method and lock add, lock xadd, lock or instructions are wait-free population-oblivious, while their "algorithm", that is the work performed by the hardware to lock the cache line, is wait-free bounded.
  • The load() method is always wait-free population-oblivious.

1 For example:

  • all threads will make progress in a finite number of steps (source)

  • all threads will make progress in a bounded number of steps2

  • per "step" that they all execute, all threads will make forward progress without any starvation (source)

2 It is not clear whether the bound is constant, or it may depend on the number of threads.

3 A strange name and not good for an acronym, so maybe another one should be chosen.



来源:https://stackoverflow.com/questions/61849972/anything-in-stdatomic-is-wait-free

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!