How to use atomicCAS for multiple variables with conditionals in CUDA

杀马特。学长 韩版系。学妹 提交于 2019-12-08 11:03:28

问题


I recently encountered a simple notion in programming but i stuck when i tried to implement it in cuda. Suppose that i have thousands of elements and i want to find the closest pair between them. I use atomicMIN in global memory (suppose that we dont want to reduce) so if the distance which is calculated by each thread is smaller than the distance stored in the global variable the atomicCAS will replace it with the smaller value. For example i have the global variable float gbl_min_dist

To do this I use the following code:

__device__ inline float atomicMin(float *addr, float value){
    float old = *addr, assumed;
    if( old <= value ) return old;
    do{
        assumed = old;
        old = atomicCAS((unsigned int*)addr, __float_as_int(assumed), __float_as_int(value));
    }while( old!=assumed );
    return old;
}

Suppose now that we want to store the index of the two points that were closer together and for which the atomicMIN has successfully replaced the old minimum distance with the one calculated by those two points. What I mean is that I only want to store the indeces of the two points that currently have the smaller distance if and only if its distance has just been successfully swaped in the global variable

typedef struct {float gbl_min_dist, 
                unsigned int point1,
                unsigned int point2;} global_closest_points;

So here, when a thread executes the atomicMIN, if the value that is proposed by that tread to be compared is swapped in the gbl_min_dist then i also need to swap the p1, p2 with the values from the thread. If the gbl_min_dist is not swapped then I dont want to store the points cause this would give wrong points but correct minimum distance.

Is there any return value to check if atomicCAS has made the swap?

Any ideas on how to implement this within the atomicMIN?

Thanks in advance


回答1:


  1. You could use a critical section to have each thread have exclusive access to the data while it is updating it.
  2. Since your gbl_min_dist is a 32-bit value, if you can figure out a way to squeeze both p1 and p2 into a single 32-bit value, you could use an approach like the custom atomics answer I gave here.

If you simply use whether or not the atomicCAS made the first swap to condition additional code to update p1 and p2, I think it's still possible to have a race condition that allows your data to get out of sync between thread updates.




回答2:


You could construct a critical section to atomically update the min value and corresponding point indices. The following link gives a example on how to build the CS with atomicCAS() and atomicExch().

https://github.com/ArchaeaSoftware/cudahandbook/blob/master/memory/spinlockReduction.cu

On the other hand, I would suggest replace the atomic min operations by parallel reduction algorithm. That may improve the performance.




回答3:


The way I suggest is, rather than depend on a stored distance, recompute it any time it is critical that the stored points may have changed:

typedef struct {
    unsigned int point1, 
    unsigned int point2;
}

global_closest_points, local_closest_points, temp_c_p;

local_dist = distance(local_closest_points.point1, local_closest_points.point2);
temp_c_p = global_closest_points;
while (local_dist < distance(temp_c_p.point1, temp_c_p.point2)
    temp_c_p = atomicCAS(&global_closest_points, temp_c_p, local_closest_points);

Old habits were, save rather than recompute. But with modern processors, that's often not optimal. On CUDA, an atomic update to global memory takes more time than computing hundreds of double-precision distances.



来源:https://stackoverflow.com/questions/19363066/how-to-use-atomiccas-for-multiple-variables-with-conditionals-in-cuda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!