Before asking this, I have read this question , which is similar to mine.
Here I will provide my program in detail.
#define N 70000
#define M 1000
c
If I modify your code with some instrumentation, like this:
#include <cstdio>
#include <iostream>
#define N 70000
#define M 1000
class ObjBox
{
public:
int oid;
float x;
float y;
float ts;
};
class Bucket
{
public:
int bid;
int nxt;
ObjBox *arr_obj;
int nO;
};
int main()
{
Bucket *arr_bkt;
cudaMallocManaged(&arr_bkt, N * sizeof(Bucket));
for (int i = 0; i < N; i++) {
arr_bkt[i].bid = i;
arr_bkt[i].nxt = -1;
arr_bkt[i].nO = 0;
size_t allocsz = size_t(M) * sizeof(ObjBox);
cudaError_t r = cudaMallocManaged(&(arr_bkt[i].arr_obj), allocsz);
if (r != cudaSuccess) {
printf("CUDA Error on %s\n", cudaGetErrorString(r));
exit(0);
} else {
size_t total_mem, free_mem;
cudaMemGetInfo(&free_mem, &total_mem);
std::cout << i << ":Allocated " << allocsz;
std::cout << " Currently " << free_mem << " bytes free" << std::endl;
}
for (int j = 0; j < M; j++) {
arr_bkt[i].arr_obj[j].oid = -1;
arr_bkt[i].arr_obj[j].x = -1;
arr_bkt[i].arr_obj[j].y = -1;
arr_bkt[i].arr_obj[j].ts = -1;
}
}
std::cout << "Bucket Array Initial Completed..." << std::endl;
cudaFree(arr_bkt);
return 0;
}
and compile and run it on a unified memory system with 16Gb physical host memory and 2Gb physical device memory with the Linux 352.39 driver, I get this:
0:Allocated 16000 Currently 2099871744 bytes free
1:Allocated 16000 Currently 2099871744 bytes free
2:Allocated 16000 Currently 2099871744 bytes free
3:Allocated 16000 Currently 2099871744 bytes free
4:Allocated 16000 Currently 2099871744 bytes free
5:Allocated 16000 Currently 2099871744 bytes free
6:Allocated 16000 Currently 2099871744 bytes free
7:Allocated 16000 Currently 2099871744 bytes free
8:Allocated 16000 Currently 2099871744 bytes free
9:Allocated 16000 Currently 2099871744 bytes free
....
....
....
65445:Allocated 16000 Currently 1028161536 bytes free
65446:Allocated 16000 Currently 1028161536 bytes free
65447:Allocated 16000 Currently 1028161536 bytes free
65448:Allocated 16000 Currently 1028161536 bytes free
65449:Allocated 16000 Currently 1028161536 bytes free
65450:Allocated 16000 Currently 1028161536 bytes free
65451:Allocated 16000 Currently 1028161536 bytes free
CUDA Error on out of memory
i.e. it reports out of memory with plenty of free memory remaining on the device.
I think the key to understanding this is the number of allocations, at the failure point, rather than their size. 65451 is suspiciously close to 65535 (i.e. 2^16). Allowing for the internal memory allocations that the runtime makes, I am going to guess that there is some sort of accidental or deliberate limit on the total number of memory managed memory allocations to 65535.
I would be very interested to see whether you can reproduce this. If you can, I would be contemplating filing a bug report with NVIDIA.