CUDA atomicAdd for doubles definition error

后端 未结 1 697
我寻月下人不归
我寻月下人不归 2021-02-05 12:12

In previous versions of CUDA, atomicAdd was not implemented for doubles, so it is common to implement this like here. With the new CUDA 8 RC, I run into troubles when I try to c

相关标签:
1条回答
  • 2021-02-05 12:49

    That flavor of atomicAdd is a new method introduced for compute capability 6.0. You may keep your previous implementation of other compute capabilities guarding it using macro definition

    #if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
    #else
    <... place here your own pre-pascal atomicAdd definition ...>
    #endif
    

    This macro named architecture identification macro is documented here:

    5.7.4. Virtual Architecture Identification Macro

    The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.

    This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

    I assume NVIDIA did not place it for previous CC to avoid conflict for users defining it and not moving to Compute Capability >= 6.x. I would not consider it a BUG though, rather a release delivery practice.

    EDIT: macro guard was incomplete (fixed) - here a complete example.

    #if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
    #else
    __device__ double atomicAdd(double* a, double b) { return b; }
    #endif
    
    __device__ double s_global ;
    __global__ void kernel () { atomicAdd (&s_global, 1.0) ; }
    
    
    int main (int argc, char* argv[])
    {
            kernel<<<1,1>>> () ;
            return ::cudaDeviceSynchronize () ;
    }
    

    Compilation with:

    $> nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2016 NVIDIA Corporation
    Built on Wed_May__4_21:01:56_CDT_2016
    Cuda compilation tools, release 8.0, V8.0.26
    

    Command lines (both successful):

    $> nvcc main.cu -arch=sm_60
    $> nvcc main.cu -arch=sm_35
    

    You may find why it works with the include file: sm_60_atomic_functions.h, where the method is not declared if __CUDA_ARCH__ is lower than 600.

    0 讨论(0)
提交回复
热议问题