What device number should I use (0 or 1), to copy P2P (GPU0->GPU1)?

一笑奈何 提交于 2019-12-19 11:48:56

问题


What number of device do I must to set 0 or 1 in cudaSetDevice();, to copy P2P (GPU0->GPU1) by using cudaStreamCreate(stream); cudaMemcpyPeerAsync(p1, 1, p0, 0, size, stream); ?

Code:

// Set device 0 as current
cudaSetDevice(0); 
float* p0;
size_t size = 1024 * sizeof(float);
// Allocate memory on device 0
cudaMalloc(&p0, size); 
// Set device 1 as current
cudaSetDevice(1); 
float* p1;
// Allocate memory on device 1
cudaMalloc(&p1, size); 
// Set device 0 as current
cudaSetDevice(0);
// Launch kernel on device 0
MyKernel<<<1000, 128>>>(p0); 

// What number do I must to set 0 or 1?
cudaSetDevice(1); // cudaSetDevice(0); 
cudaStream_t stream;
cudaStreamCreate(stream);

// Copy p0 to p1
cudaMemcpyPeerAsync(p1, 1, p0, 0, size, stream); 
cudaStreamSynchronize(stream);

// Launch kernel on device 1
cudaSetDevice(1); 
MyKernel<<<1000, 128>>>(p1);

UPDATE 31.03.2014: Or does the current context important only for __global__ kernel_function(), not for cudaMemcpyPeerAsync()? And for cudaMemcpyAsync() and cudaMemcpyPeerAsync() is only important that stream has been created for the device from (source pointer) which the data is copied, isn't it?


回答1:


In the call to cudaMemcpyPeerAsync you can specify a non-default stream. So your first question is: which device should I set by cudaSetDevice before the call to cudaMemcpyPeerAsync?

The answer is that you have to set, by cudaSetDevice, the device for which the stream has been created. You can either use a stream created for the source or for the destination device. Although, at the best of my knowledge, not explicitly mentioned in the documentation, this possibility can be inferred by Robert Crovella's answer to How to define destination device stream in cudaMemcpyPeerAsync()?. Please, note that, as of 2011 and according to Multi-GPU Programming, performance is maximized when stream belongs to the source GPU.

Let me recall some important points when using streams in the framework of multi-GPU, borrowed from Multi-GPU Programming, and which support the above statements:

  1. CUDA streams are per device;
  2. streams are determined by the GPU that was current at the time of their creation;
  3. Calls to a stream can be issued only when its device is current.


来源:https://stackoverflow.com/questions/22736159/what-device-number-should-i-use-0-or-1-to-copy-p2p-gpu0-gpu1

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!