Well, I have quite a delicate question :)
Let\'s start with what I have:
In his comment, Roger Dahl has linked the following post
Passing the PTX program to the CUDA driver directly
in which the use of two functions, namely cuModuleLoad
and cuModuleLoadDataEx
, are addressed. The former is used to load PTX code from file and passing it to the nvcc
compiler driver. The latter avoids I/O and enables to pass the PTX code to the driver as a C string. In either cases, you need to have already at your disposal the PTX code, either as the result of the compilation of a CUDA kernel (to be loaded or copied and pasted in the C string) or as an hand-written source.
But what happens if you have to create the PTX code on-the-fly starting from a CUDA kernel? Following the approach in CUDA Expression templates, you can define a string containing your CUDA kernel like
ss << "extern \"C\" __global__ void kernel( ";
ss << def_line.str() << ", unsigned int vector_size, unsigned int number_of_used_threads ) { \n";
ss << "\tint idx = blockDim.x * blockIdx.x + threadIdx.x; \n";
ss << "\tfor(unsigned int i = 0; i < ";
ss << "(vector_size + number_of_used_threads - 1) / number_of_used_threads; ++i) {\n";
ss << "\t\tif(idx < vector_size) { \n";
ss << "\t\t\t" << eval_line.str() << "\n";
ss << "\t\t\tidx += number_of_used_threads;\n";
ss << "\t\t}\n";
ss << "\t}\n";
ss << "}\n\n\n\n";
then using system calls to compile it as
int nvcc_exit_status = system(
(std::string(NVCC) + " -ptx " + NVCC_FLAGS + " " + kernel_filename
+ " -o " + kernel_comp_filename).c_str()
);
if (nvcc_exit_status) {
std::cerr << "ERROR: nvcc exits with status code: " << nvcc_exit_status << std::endl;
exit(1);
}
and finally use cuModuleLoad
and cuModuleGetFunction
to load the PTX code from file and passing it to the compiler driver like
result = cuModuleLoad(&cuModule, kernel_comp_filename.c_str());
assert(result == CUDA_SUCCESS);
result = cuModuleGetFunction(&cuFunction, cuModule, "kernel");
assert(result == CUDA_SUCCESS);
Of course, expression templates have nothing to do with this problem and I'm only quoting the source of the ideas I'm reporting in this answer.