What number of device do I must to set 0 or 1 in cudaSetDevice();
, to copy P2P (GPU0->GPU1) by using cudaStreamCreate(stream); cudaMemcpyPeerAsync(p1, 1,
In the call to cudaMemcpyPeerAsync you can specify a non-default stream
. So your first question is: which device should I set by cudaSetDevice
before the call to cudaMemcpyPeerAsync
?
The answer is that you have to set, by cudaSetDevice
, the device for which the stream
has been created. You can either use a stream
created for the source or for the destination device. Although, at the best of my knowledge, not explicitly mentioned in the documentation, this possibility can be inferred by Robert Crovella's answer to How to define destination device stream in cudaMemcpyPeerAsync()?. Please, note that, as of 2011 and according to Multi-GPU Programming, performance is maximized when stream
belongs to the source GPU.
Let me recall some important points when using streams
in the framework of multi-GPU, borrowed from Multi-GPU Programming, and which support the above statements:
streams
are per device;streams
are determined by the GPU that was current at the time of their creation;stream
can be issued only when its device is current.