I have a CUDA template library, in which one function is actually not a template, but is defined within a .cuh
header. (vector_add_kernel
If you want to keep your current code organisation, you have a very simple solution which is to declare your kernel static
(in place of your inline
keyword). This will prevent the linker from complaining, but will however generate as many different versions of the kernel as there will be of compilation units (object files) where the kernel.cuh
will have been included.
Another solution would be to templatise your kernel. I know you already dismissed this possibility, but you should reconsider it, since your kernel is a natural template for the float
type of the input parameters...