CUDA atomicAdd for doubles definition error

后端未结

关注

 1  702

In previous versions of CUDA, atomicAdd was not implemented for doubles, so it is common to implement this like here. With the new CUDA 8 RC, I run into troubles when I try to c

相关标签:

1条回答

爱一瞬间的悲伤

2021-02-05 12:49
That flavor of atomicAdd is a new method introduced for compute capability 6.0. You may keep your previous implementation of other compute capabilities guarding it using macro definition
```
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
<... place here your own pre-pascal atomicAdd definition ...>
#endif
```
This macro named architecture identification macro is documented here:

5.7.4. Virtual Architecture Identification Macro

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

I assume NVIDIA did not place it for previous CC to avoid conflict for users defining it and not moving to Compute Capability >= 6.x. I would not consider it a BUG though, rather a release delivery practice.

EDIT: macro guard was incomplete (fixed) - here a complete example.
```
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
__device__ double atomicAdd(double* a, double b) { return b; }
#endif

__device__ double s_global ;
__global__ void kernel () { atomicAdd (&s_global, 1.0) ; }


int main (int argc, char* argv[])
{
        kernel<<<1,1>>> () ;
        return ::cudaDeviceSynchronize () ;
}
```
Compilation with:
```
$> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26
```
Command lines (both successful):
```
$> nvcc main.cu -arch=sm_60
$> nvcc main.cu -arch=sm_35
```
You may find why it works with the include file: sm_60_atomic_functions.h, where the method is not declared if __CUDA_ARCH__ is lower than 600.
0 讨论(0)
发布评论:

提交评论
- 加载中...