问题
I understand the template functions
usually are to be declared and defined in header files.
The problem I am having is that my template function
makes calls to other functions. The prototypes of those other functions are in the same header file before the template function itself.
That portion of the code:
//header.h
template <int ignoreAdetection>
__global__ void MCMLKernel(SimState d_state, GPUThreadStates tstates)
{
// photon structure stored in registers
PhotonStructGPU photon;
// random number seeds
UINT64 rnd_x;
UINT32 rnd_a;
// Flag to indicate if this thread is active
UINT32 is_active;
// Restore the thread state from global memory.
RestoreThreadState(&d_state, &tstates, &photon, &rnd_x, &rnd_a, &is_active);
...
...
}
The function RestoreThreadState
is the first of several functions called from this template function. The others are called within a for loop.
I am not sure if this template function should or should not be in the header file. If should be in a header file, how do I call those other functions?
The errors I am getting from the compiler during the instantiation of MCMLKernel:
- error: explicit type is missing ("int" assumed)
- error: variable "RestoreThreadState" may not be initialized
- error: a value of type "SimState *" cannot be used to initialize an entity of type "int"
- error: expected a ")"
- warning: declaration is incompatible with previous "RestoreThreadState"
Additional details. All this functions are CUDA kernel functions. MCMLKernel
is a __global__
kernel and the rest of the functions it calls are __device__
kernels. I am using Nsight Eclipse Edition and compute capability 1.3 GPUs (four Tesla C1060 cards).
回答1:
__device__ void RestoreThreadState(...);
is a forward declaration, meaning that a proper definition follows in the same translation unit (it may even be below function invocation).
__device__ void RestoreThreadState(...) {}
is a definition of a no-op function.
__device__ int HitBoundary(PhotonStructGPU *photon);
is a forward-declaration. Do you supply a definition?
Update
Here's the Gist based on code snippets you provided. I kept kernel, device function and types declaration in 3 separate headers. I can compile it (though obviously it can't be ran)
回答2:
@Eugene provided the answer to this question.
I created the prototype of the called-to function like this
__device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
PhotonStructGPU *photon,
UINT64 *rnd_x, UINT32 *rnd_a,
UINT32 *is_active);
But, with an example given by Eugene it should be like this (NOT A PROTOTYPE! See other answer)
__device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
PhotonStructGPU *photon,
UINT64 *rnd_x, UINT32 *rnd_a,
UINT32 *is_active){}
Note the {}
at the end.
Next in the code I call to functions which actually return values like this
__device__ int HitBoundary(PhotonStructGPU *photon);
Called from the same MCMLKernel
template function. It is giving me a warning:
- warning: missing return statement at end of non-void function "HitBoundary"
UPDATE: In another source file kernel.cu I have the following declaration and definition (reason why I have multiple definition problems):
__device__ void RestoreThreadState(SimState *d_state, GPUThreadStates *tstates,
PhotonStructGPU *photon,
UINT64 *rnd_x, UINT32 *rnd_a,
UINT32 *is_active)
{
UINT32 tid = blockIdx.x * NUM_THREADS_PER_BLOCK + threadIdx.x;
*rnd_x = d_state->x[tid];
*rnd_a = d_state->a[tid];
photon->x = tstates->photon_x[tid];
photon->y = tstates->photon_y[tid];
photon->z = tstates->photon_z[tid];
photon->ux = tstates->photon_ux[tid];
photon->uy = tstates->photon_uy[tid];
photon->uz = tstates->photon_uz[tid];
photon->w = tstates->photon_w[tid];
photon->sleft = tstates->photon_sleft[tid];
photon->layer = tstates->photon_layer[tid];
*is_active = tstates->is_active[tid];
}
IN SUMMARY: I have four source files
- main.cu
- kernel.cu
- rng.cu
- mem.cu
Each source file, except main.cu has an associated header
- kernel.cuh
- rng.cuh
- mem.cuh
where I want to forward-declare functions to use in main.cu.
Everything is fine until I get to the template function which calls functions from kernel.cu and rng.cu.
来源:https://stackoverflow.com/questions/19170056/template-function-calls-to-other-functions