allocate unified memory in my program. aftering running, it throws CUDA Error:out of memory,but still has free memory

前端 未结 1 486
天涯浪人
天涯浪人 2021-01-14 06:40

Before asking this, I have read this question , which is similar to mine.

Here I will provide my program in detail.

#define N 70000
#define M 1000

c         


        
1条回答
  •  鱼传尺愫
    2021-01-14 07:03

    If I modify your code with some instrumentation, like this:

    #include 
    #include 
    
    #define N 70000
    #define M 1000
    
    class ObjBox
    {
        public:
    
            int oid; 
            float x; 
            float y; 
            float ts;
    };
    
    class Bucket
    {
        public:
    
            int bid; 
            int nxt; 
            ObjBox *arr_obj; 
            int nO;
    };
    
    int main()
    {
    
        Bucket *arr_bkt;
        cudaMallocManaged(&arr_bkt, N * sizeof(Bucket));
    
        for (int i = 0; i < N; i++) {
            arr_bkt[i].bid = i; 
            arr_bkt[i].nxt = -1;
            arr_bkt[i].nO = 0;
    
            size_t allocsz = size_t(M) * sizeof(ObjBox);
            cudaError_t r = cudaMallocManaged(&(arr_bkt[i].arr_obj), allocsz);
            if (r != cudaSuccess) {
                printf("CUDA Error on %s\n", cudaGetErrorString(r));
                exit(0);
            } else {
                size_t total_mem, free_mem;
                cudaMemGetInfo(&free_mem, &total_mem);
                std::cout << i << ":Allocated " << allocsz;
                std::cout << " Currently " << free_mem << " bytes free" << std::endl;
            } 
    
            for (int j = 0; j < M; j++) {
                arr_bkt[i].arr_obj[j].oid = -1;
                arr_bkt[i].arr_obj[j].x = -1;
                arr_bkt[i].arr_obj[j].y = -1;
                arr_bkt[i].arr_obj[j].ts = -1;
            }
        }
    
        std::cout << "Bucket Array Initial Completed..." << std::endl;
        cudaFree(arr_bkt);
    
        return 0;
    }
    

    and compile and run it on a unified memory system with 16Gb physical host memory and 2Gb physical device memory with the Linux 352.39 driver, I get this:

    0:Allocated 16000 Currently 2099871744 bytes free
    1:Allocated 16000 Currently 2099871744 bytes free
    2:Allocated 16000 Currently 2099871744 bytes free
    3:Allocated 16000 Currently 2099871744 bytes free
    4:Allocated 16000 Currently 2099871744 bytes free
    5:Allocated 16000 Currently 2099871744 bytes free
    6:Allocated 16000 Currently 2099871744 bytes free
    7:Allocated 16000 Currently 2099871744 bytes free
    8:Allocated 16000 Currently 2099871744 bytes free
    9:Allocated 16000 Currently 2099871744 bytes free
    ....
    ....
    ....
    65445:Allocated 16000 Currently 1028161536 bytes free
    65446:Allocated 16000 Currently 1028161536 bytes free
    65447:Allocated 16000 Currently 1028161536 bytes free
    65448:Allocated 16000 Currently 1028161536 bytes free
    65449:Allocated 16000 Currently 1028161536 bytes free
    65450:Allocated 16000 Currently 1028161536 bytes free
    65451:Allocated 16000 Currently 1028161536 bytes free
    CUDA Error on out of memory    
    

    i.e. it reports out of memory with plenty of free memory remaining on the device.

    I think the key to understanding this is the number of allocations, at the failure point, rather than their size. 65451 is suspiciously close to 65535 (i.e. 2^16). Allowing for the internal memory allocations that the runtime makes, I am going to guess that there is some sort of accidental or deliberate limit on the total number of memory managed memory allocations to 65535.

    I would be very interested to see whether you can reproduce this. If you can, I would be contemplating filing a bug report with NVIDIA.

    0 讨论(0)
提交回复
热议问题