I would like to ask to you guys if there is a better way to combine 2 atomics.
My goal is to find the highest results for a set of K equations (more than 32) under a lis
there is any issue in the following code?
Yes, you can't use two atomics like that and expect coherent results. You have set up a possible race condition.
Suppose thread A does the atomicMax
and replaces the old value with 100. Then thread B does the atomicMax
and replaces the 100 value with 110. Then suppose thread B does the atomicCAS
, and replaces its index. Then thread A does the atomicCAS
, and replaces thread B index with thread A index. You now have a max value of 110 with an index corresponding to thread A.
Even within a single warp, there is no stated order of execution of atomic operations.
Is there a better way?
since your values are both 32-bit quantities, you might be interested in using a custom 64-bit atomic operation like this to update a value and an index at the same time, atomically.
For large scale usage (lots of threads) you may want to explore a classical paraellel reduction. There are questions here on the CUDA tag such as this one and this one that discuss how to do an index+value reduction.
Global atomics on Kepler are pretty fast, so depending on your exact code and reduction "density" a global atomic reduction might not be a big problem performance-wise.