I\'ve started writing a new CUDA application. However I hit a funny detour along the way. Calling the first cudaMalloc on a variable x, fails the first time. However when I call
The very first call to any of the cuda library functions launches an initialisation subroutine. It can happen that somehow the initialisation fails and not the cudaMalloc itself. (CUDA Programming Guide, section 3.2.1)
Somehow, later, however it seems it works, despite the initial failure. I don't know your setting and your code so I can't really help you further. Check the Programming Guide!
I would strongly recommend using the CUDA_SAFE_CALL
macro if you aren't -- to force the thread synchronisation, at least while you're debugging the code:
CUDA_SAFE_CALL(cudaMalloc((void**) &(myVar), mem_size_N ));
Update: As per @talonmies, you don't need the cutil library. So let's rewrite the solution:
/* Allocate Data */
cudaMalloc((void**) &(myVar), mem_size_N );
/* Force Thread Synchronization */
cudaError err = cudaThreadSynchronize();
/* Check for and display Error */
if ( cudaSuccess != err )
{
fprintf( stderr, "Cuda error in file '%s' in line %i : %s.\n",
__FILE__, __LINE__, cudaGetErrorString( err) );
}
And as noted in the other answer -- you may want to include the synch & check before you allocation memory just to make sure the API initialized correctly.