I encountered a problem when using the overloaded kernel functions in CUDA.
I can understand CUDA can launch an overloaded function by its arguments.
However, if I