Why is cudaMalloc giving me an error when I know there is sufficient memory space?

前端 未结 1 790
春和景丽
春和景丽 2021-01-12 19:00

I have a Tesla C2070 that is supposed to have 5636554752 bytes of memory.

However, this gives me an error:

int *buf_d = NULL;

err = cudaMalloc((void         


        
相关标签:
1条回答
  • 2021-01-12 19:53

    The basic problem is in your question title - you don't actually know that you have sufficient memory, you are assuming you do. The runtime API includes the cudaMemGetInfo function which will return how much free memory there is on the device. When a context is established on a device, the driver must reserved space for device code, local memory for each thread, fifo buffers for printf support, stack for each thread, and heap for in-kernel malloc/new calls (see this answer for further details). All of this can consume rather a lot of memory, leaving you with much less than the maximum avialable memory after ECC reservations you are assuming to be available to your code. The API also includes cudaDeviceGetLimit which you can use to query the amounts of memory that on device runtime support is consuming. There is also a companion call cudaDeviceSetLimit which can allow you to change the amount of memory each component of runtime support will reserve.

    Even after you tuned the runtime memory footprint to your tastes and have the actual free memory value from the driver, there is still page size granularity and fragmentation considerations to contend with. Rarely is it possible to allocate every byte of what the API will report as free. Usually, I would do something like this when the objective is to try and allocate every available byte on the card:

    const size_t Mb = 1<<20; // Assuming a 1Mb page size here
    
    size_t available, total;
    cudaMemGetInfo(&available, &total);
    
    int *buf_d = 0; 
    size_t nwords = total / sizeof(int);
    size_t words_per_Mb = Mb / sizeof(int);
    
    while(cudaMalloc((void**)&buf_d,  nwords * sizeof(int)) == cudaErrorMemoryAllocation)
    {
        nwords -= words_per_Mb;
        if( nwords  < words_per_Mb)
        {
            // signal no free memory
            break;
        }
    }
    
    // leaves int buf_d[nwords] on the device or signals no free memory
    

    (note never been near a compiler, only safe on CUDA 3 or later). It is implicitly assumed that none of the obvious sources of problems with big allocations apply here (32 bit host operating system, WDDM windows platform without TCC mode enabled, older known driver issues).

    0 讨论(0)
提交回复
热议问题