The behavior of stream 0 (default) and other streams

问题

In CUDA, how is stream 0 related to other streams? Does stream 0 (default stream) execute concurrently with other streams in a context or not?

Considering the following example:

cudaMemcpy(Dst, Src, sizeof(float)*datasize, cudaMemcpyHostToDevice);//stream 0;

cudaStream_t stream1;

/...creating stream1.../

somekernel<<<blocks, threads, 0, stream1>>>(Dst);//stream 1;

In the above code, can the compiler ensure somekernel always launches AFTER cudaMemcpy finishes or will somekernel execuate concurrently with cudaMemcpy?

回答1:

cudaMemcpy call is (in all but a particular case) a synchronous call. The host thread running that code blocks until the memory transfer to the host. It cannot proceed to launch the kernel until the cudaMemcpy call has returned, it that doesn't happen until the copy operation is completed.

More generally, the default stream (0 or null) implicitly serializes operations on the GPU whenever an operation is active in that stream. If you create streams and push operations into them at the same time as an operation is being performed in default stream, all concurrency in those streams is lost until the default stream is idle.

来源：https://stackoverflow.com/questions/18443205/the-behavior-of-stream-0-default-and-other-streams

标签

cuda

gpu

nvidia

cuda-streams

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!