How to allocate all available global memory on the GeForce GTX 690 device?

丶灬走出姿态 提交于 2019-12-31 03:48:26

问题


Now I need to allocate all available memory with cuda technology. I do it with Tesla C2050, Quadro 600 and GeForce GTX 560 Ti by: First, I allocate 0 bytes of global memory on device. Second step is define available memory of device by cudaMemGetInfo function and make allocation of that available memory. It works for devices listed above. But this mechanism doesn't work with GeForce GTX 690.

Could somebody help me, what mechanism can I use to allocate memory on the GeForce GTX 690 device or any paradigm for that operation?

It looks like this:

cudaSetDevice(deviceIndex);

int (*reservedMemory);

cudaMalloc(&reservedMemory, 0);

size_t freeMemory, totalMemory;

cudaMemGetInfo(&freeMemory, &totalMemory);

cudaMalloc(&reservedMemory, freeMemory);

On the GeForce GTX 690, one of two existing streaming multiprocessors operate on 2147483648 bytes of memory, but I can allocate only 1341915136 bytes of free global memory that is equal to 2050109440 bytes. On the Quadro 600, one existing streaming multiprocessor operate on 1073414144 bytes of memory, and I can allocate all available 859803648 bytes of free global memory that is equal to 859803648 bytes.

For an example on Quadro 600 (showed compilation, linking and execution procedure):

D:\Gdmt> nvcc -arch=compute_20 -code=sm_21 -c ./Gdmt.cu -o ./Gdmt.obj
Gdmt.cu
tmpxft_00000bb4_00000000-3_Gdmt.cudafe1.gpu
tmpxft_00000bb4_00000000-8_Gdmt.cudafe2.gpu
Gdmt.cu
tmpxft_00000bb4_00000000-3_Gdmt.cudafe1.cpp
tmpxft_00000bb4_00000000-14_Gdmt.ii

D:\Gdmt> nvcc ./Gdmt.obj -o ./Gdmt.exe

D:\Gdmt> nvcc -arch=compute_20 -code=sm_21 -c ./Gdmt_additional.cu -o ./Gdmt_add
itional.obj
Gdmt_additional.cu
tmpxft_00000858_00000000-3_Gdmt_additional.cudafe1.gpu
tmpxft_00000858_00000000-8_Gdmt_additional.cudafe2.gpu
Gdmt_additional.cu
tmpxft_00000858_00000000-3_Gdmt_additional.cudafe1.cpp
tmpxft_00000858_00000000-14_Gdmt_additional.ii

D:\Gdmt> nvcc ./Gdmt_additional.obj -o ./Gdmt_additional.exe

D:\Gdmt> Gdmt.exe
Total amount of memory: 1073414144 Bytes;
Memory to reserve: 859803648 Bytes;
Memory reserved: 859803648 Bytes;
^C
D:\Gdmt> Gdmt_additional.exe
Allocation is succeeded on 890830848 bytes of reserved memory.
^C
D:\Gdmt>

For an example on GeForce GTX 690 (showed compilation, linking and execution procedure):

J:\Gdmt> nvcc -arch=compute_30 -code=sm_30 -c ./Gdmt.cu -o ./Gdmt.obj
Gdmt.cu
tmpxft_000011f0_00000000-5_Gdmt.cudafe1.gpu
tmpxft_000011f0_00000000-10_Gdmt.cudafe2.gpu
Gdmt.cu
tmpxft_000011f0_00000000-5_Gdmt.cudafe1.cpp
tmpxft_000011f0_00000000-15_Gdmt.ii

J:\Gdmt> nvcc ./Gdmt.obj -o ./Gdmt.exe

J:\Gdmt> nvcc -arch=compute_30 -code=sm_30 -c ./Gdmt_additional.cu -o ./Gdmt_add
itional.obj
Gdmt_additional.cu
tmpxft_00001164_00000000-5_Gdmt_additional.cudafe1.gpu
tmpxft_00001164_00000000-10_Gdmt_additional.cudafe2.gpu
Gdmt_additional.cu
tmpxft_00001164_00000000-5_Gdmt_additional.cudafe1.cpp
tmpxft_00001164_00000000-15_Gdmt_additional.ii

J:\Gdmt> nvcc ./Gdmt_additional.obj -o ./Gdmt_additional.exe

J:\Gdmt> Gdmt.exe
Total amount of memory: 2147483648 Bytes;
Memory to reserve: 2050109440 Bytes;
Warning, memory allocation process is not succeeded!
^C
J:\Gdmt> Gdmt_additional.exe
Allocation is succeeded on 1341915136 bytes of reserved memory.
^C

Examples is archived and located at:

(z7 archive - 78.5 KB ~ 80,434 bytes) https://docs.google.com/file/d/0BzZ5q0v8n-qTTDctVDV5Mnh2ODA/edit (zip archive - 163 KB ~ 167,457 bytes) https://docs.google.com/file/d/0BzZ5q0v8n-qTT2xoV3NXSzhQMDQ/edit

This topic is a clone of topic posted at "The GeForce Lounge" and "CUDA Programming and Performance", with the same name.


回答1:


I could rerun your examples and came to the same result.

I tried to tackle the problem from the other side, and tried to allocate blocks of decreasing size.

int (*reservedMemory);
size_t const NBlockSize = 1300 *1024*1024; 
size_t freeMemory, totalMemory;
cudaError_t nErr = cudaSuccess;
size_t nTotalAlloc=0;
while( nErr == cudaSuccess )
{
    cudaMemGetInfo(&freeMemory, &totalMemory);
    std::cout << "===========================================================" << std::endl;
    std::cout << "Free/Total(kB): " << freeMemory/1024 << "/" << totalMemory/1024 << std::endl;

    size_t nAllocSize = NBlockSize;
    while( nAllocSize > freeMemory )
        nAllocSize /= 2;

    nErr = cudaMalloc(&reservedMemory, nAllocSize );
    if( nErr == cudaSuccess )
        nTotalAlloc += nAllocSize;
    std::cout << "AllocSize(kB): " << nAllocSize/1024 << ", error: " << cudaGetErrorString(nErr) << std::endl;

}
std::cout << "TotalAlloc/Total (kB): " << nTotalAlloc/1024 << "/" << totalMemory/1024 << std::endl;

The program starts with a block of size NBlockSize and if freeMemory decreases, also decrease nAllocSize. Looking at the output below, it seems cudaMalloc behaves a bit unpredictable when allocating blocks which are kind of big related to freeMemory. At one point it manages to allocate more than 98% of free memory, at another point it fails to allocate 800MB out of 1GB of available memory.

The most interesting run is the one with starting block size of 700MB. It manages to 1400kB out of 1428 in the last successful loop, and fails at allocating 10 out of 20 kB in the next run.

Depending on the starting size, the program managed to allocate all free space except 8kB at the best run, and left over one gigabyte on the worst.

D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 1000
===========================================================
Free/Total(kB): 1797120/2097152
AllocSize(kB): 1024000, percentage of freememory: 0.569801, error: no error
===========================================================
Free/Total(kB): 773120/2097152
AllocSize(kB): 512000, percentage of freememory: 0.662252, error: no error
===========================================================
Free/Total(kB): 261120/2097152
AllocSize(kB): 256000, percentage of freememory: 0.980392, error: no error
===========================================================
Free/Total(kB): 5128/2097152
AllocSize(kB): 4000, percentage of freememory: 0.780031, error: no error
===========================================================
Free/Total(kB): 1032/2097152
AllocSize(kB): 1000, percentage of freememory: 0.968992, error: no error
===========================================================
Free/Total(kB): 8/2097152
AllocSize(kB): 7, percentage of freememory: 0.976563, error: out of memory
TotalAlloc/Total (kB): 1797000/2097152


D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 1200
===========================================================
Free/Total(kB): 1796864/2097152
AllocSize(kB): 1228800, percentage of freememory: 0.683858, error: no error
===========================================================
Free/Total(kB): 568072/2097152
AllocSize(kB): 307200, percentage of freememory: 0.540777, error: no error
===========================================================
Free/Total(kB): 260872/2097152
AllocSize(kB): 153600, percentage of freememory: 0.588795, error: no error
===========================================================
Free/Total(kB): 107272/2097152
AllocSize(kB): 76800, percentage of freememory: 0.715937, error: no error
===========================================================
Free/Total(kB): 30472/2097152
AllocSize(kB): 19200, percentage of freememory: 0.630087, error: no error
===========================================================
Free/Total(kB): 11272/2097152
AllocSize(kB): 9600, percentage of freememory: 0.851668, error: no error
===========================================================
Free/Total(kB): 1672/2097152
AllocSize(kB): 1200, percentage of freememory: 0.717703, error: no error
===========================================================
Free/Total(kB): 392/2097152
AllocSize(kB): 300, percentage of freememory: 0.765306, error: out of memory
TotalAlloc/Total (kB): 1796400/2097152

D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 800
===========================================================
Free/Total(kB): 1844448/2097152
AllocSize(kB): 819200, percentage of freememory: 0.444144, error: no error
===========================================================
Free/Total(kB): 1025248/2097152
AllocSize(kB): 819200, percentage of freememory: 0.799026, error: out of memory
TotalAlloc/Total (kB): 819200/2097152

D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 700
===========================================================
Free/Total(kB): 1835528/2097152
AllocSize(kB): 716800, percentage of freememory: 0.390514, error: no error
===========================================================
Free/Total(kB): 1118740/2097152
AllocSize(kB): 716800, percentage of freememory: 0.640721, error: no error
===========================================================
Free/Total(kB): 401940/2097152
AllocSize(kB): 358400, percentage of freememory: 0.891675, error: no error
===========================================================
Free/Total(kB): 43540/2097152
AllocSize(kB): 22400, percentage of freememory: 0.514469, error: no error
===========================================================
Free/Total(kB): 21140/2097152
AllocSize(kB): 11200, percentage of freememory: 0.529801, error: no error
===========================================================
Free/Total(kB): 9876/2097152
AllocSize(kB): 5600, percentage of freememory: 0.567031, error: no error
===========================================================
Free/Total(kB): 4244/2097152
AllocSize(kB): 2800, percentage of freememory: 0.659755, error: no error
===========================================================
Free/Total(kB): 1428/2097152
AllocSize(kB): 1400, percentage of freememory: 0.980392, error: no error
===========================================================
Free/Total(kB): 20/2097152
AllocSize(kB): 10, percentage of freememory: 0.546875, error: out of memory
TotalAlloc/Total (kB): 1835400/2097152



回答2:


I recently remembered, about the "Page-Locked" mechanism in cuda. I test it, and do not get satisfactory results of performance (calculation using this mechanism is ten times slower, then a version with very limited memory reservation feature for Windows with GeForce GTX 690). I just thought that copying the data to device for later calculation and writing back will be done automatically, but in fact the memory of the device is not involved.



来源:https://stackoverflow.com/questions/14721028/how-to-allocate-all-available-global-memory-on-the-geforce-gtx-690-device

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!