Why are CAS (Atomic) operations faster than synchronized or volatile operations

£可爱£侵袭症+ 提交于 2019-12-05 14:56:20
OldCurmudgeon

I believe the critical factor is as you state - the CAS mechanisms use low-level hardware instructions that allow for the minimal cache flushing and contention resolution.

The other two mechanisms (synchronization and volatile) use different architectural tricks that are more generally available across all different architectures.

CAS instructions are available in one form or another in most modern architectures but there will be a different implementation in each architecture.

Interesting quote from Brian Goetz (supposedly)

The relative speed of the operations is largely a non-issue. What is relevant is the difference in scalability between lock-based and non-blocking algorithms. And if you're running on a 1 or 2 core system, stop thinking about such things.

Non-blocking algorithms generally scale better because they have shorter "critical sections" than lock-based algorithms.

Note that a CAS does not necessarily have to access memory.

Most modern architectures implement a cache coherency protocol like MESI that allows the CPU to do shortcuts if there is only one thread accessing the data at the same time. The overhead compared to traditional, unsynchronized memory access is very low in this case.

When doing a lot of concurrent changes to the same value however, the caches are indeed quite worthless and all operations need to access main memory directly. In this case the overhead for synchronizing the different CPU caches and the serialization of memory access can lead to a significant performance drop (this is also known as cache ping-pong), which can be just as bad or even worse than what you experience with lock-based approaches.

So never simply assume that if you switch to atomics all your problems go away. The big advantage of atomics are the progress guarantees for lock-free (someone always makes progress) or wait-free (everyone finishes after a certain number of steps) implementations. However, this is often orthogonal to raw performance: A wait-free solution is likely to be significantly slower than a lock-based solution, but in some situations you are willing to accept that in order to get the progress guarantees.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!