Can anyone describe the differences between __global__
and __device__
?
When should I use __device__
, and when to use __glob
I am recording some unfounded speculations here for the time being (I will substantiate these later when I come across some authoritative source)...
__device__
functions can have a return type other than void but __global__
functions must always return void.
__global__
functions can be called from within other kernels running on the GPU to launch additional GPU threads (as part of CUDA dynamic parallelism model (aka CNP)) while __device__
functions run on the same thread as the calling kernel.
__global__
function is the definition of kernel. Whenever it is called from CPU, that kernel is launched on the GPU.
However each thread executing that kernel, might require to execute some code again and again, for example swapping of two integers. Thus, here we can write a helper function, just like we do in a C program. And for threads executing on GPU, a helper function should be declared as __device__
.
Thus, a device function is called from threads of a kernel - one instance for one thread . While, a global function is called from CPU thread.
__global__
- Runs on the GPU, called from the CPU or the GPU*. Executed with <<<dim3>>>
arguments.__device__
- Runs on the GPU, called from the GPU. Can be used with variabiles too.__host__
- Runs on the CPU, called from the CPU.*) __global__
functions can be called from other __global__
functions starting
compute capability 3.5.
__global__
is for cuda kernels, functions that are callable from the host directly. __device__
functions can be called from __global__
and __device__
functions but not from host.
__global__
is a CUDA C keyword (declaration specifier) which says that the function,
global functions (kernels) launched by the host code using <<< no_of_blocks , no_of threads_per_block>>>
.
Each thread executes the kernel by its unique thread id.
However, __device__
functions cannot be called from host code.if you need to do it use both __host__
__device__
.
Global functions are also called "kernels". It's the functions that you may call from the host side using CUDA kernel call semantics (<<<...>>>
).
Device functions can only be called from other device or global functions. __device__
functions cannot be called from host code.