Summary: I had expected that std::atomic
with std::memory_order_relaxed
would be close to the performance of jus
I do not believe that relaxed atomic loads require compare-and-swap. In the end this std::atomic implementation was not usable for my purpose, but I still wanted to have the interface, so I made my own std::atomic using MSVC's barrier intrinsics. This has better performance than the default std::atomic
for my use case. You can see the code here. It's supposed to be implemented to the C++11 spec for all the orderings for load and store. Btw GCC 4.6 is not better in this regard. I don't know about GCC 4.7.