In previous versions of CUDA, atomicAdd was not implemented for doubles, so it is common to implement this like here. With the new CUDA 8 RC, I run into troubles when I try to c
That flavor of atomicAdd is a new method introduced for compute capability 6.0. You may keep your previous implementation of other compute capabilities guarding it using macro definition
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
<... place here your own pre-pascal atomicAdd definition ...>
#endif
This macro named architecture identification macro is documented here:
5.7.4. Virtual Architecture Identification Macro
The architecture identification macro
__CUDA_ARCH__
is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.
I assume NVIDIA did not place it for previous CC to avoid conflict for users defining it and not moving to Compute Capability >= 6.x. I would not consider it a BUG though, rather a release delivery practice.
EDIT: macro guard was incomplete (fixed) - here a complete example.
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
__device__ double atomicAdd(double* a, double b) { return b; }
#endif
__device__ double s_global ;
__global__ void kernel () { atomicAdd (&s_global, 1.0) ; }
int main (int argc, char* argv[])
{
kernel<<<1,1>>> () ;
return ::cudaDeviceSynchronize () ;
}
Compilation with:
$> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26
Command lines (both successful):
$> nvcc main.cu -arch=sm_60
$> nvcc main.cu -arch=sm_35
You may find why it works with the include file: sm_60_atomic_functions.h
, where the method is not declared if __CUDA_ARCH__
is lower than 600.