What device number should I use (0 or 1), to copy P2P (GPU0->GPU1)?

前端 未结 1 1199
忘了有多久
忘了有多久 2021-01-07 13:36

What number of device do I must to set 0 or 1 in cudaSetDevice();, to copy P2P (GPU0->GPU1) by using cudaStreamCreate(stream); cudaMemcpyPeerAsync(p1, 1,

1条回答
  •  情话喂你
    2021-01-07 14:18

    In the call to cudaMemcpyPeerAsync you can specify a non-default stream. So your first question is: which device should I set by cudaSetDevice before the call to cudaMemcpyPeerAsync?

    The answer is that you have to set, by cudaSetDevice, the device for which the stream has been created. You can either use a stream created for the source or for the destination device. Although, at the best of my knowledge, not explicitly mentioned in the documentation, this possibility can be inferred by Robert Crovella's answer to How to define destination device stream in cudaMemcpyPeerAsync()?. Please, note that, as of 2011 and according to Multi-GPU Programming, performance is maximized when stream belongs to the source GPU.

    Let me recall some important points when using streams in the framework of multi-GPU, borrowed from Multi-GPU Programming, and which support the above statements:

    1. CUDA streams are per device;
    2. streams are determined by the GPU that was current at the time of their creation;
    3. Calls to a stream can be issued only when its device is current.

    0 讨论(0)
提交回复
热议问题