问题
In CUDA, how is stream 0 related to other streams? Does stream 0 (default stream) execute concurrently with other streams in a context or not?
Considering the following example:
cudaMemcpy(Dst, Src, sizeof(float)*datasize, cudaMemcpyHostToDevice);//stream 0;
cudaStream_t stream1;
/...creating stream1.../
somekernel<<<blocks, threads, 0, stream1>>>(Dst);//stream 1;
In the above code, can the compiler ensure somekernel
always launches AFTER cudaMemcpy
finishes or will somekernel
execuate concurrently with cudaMemcpy
?
回答1:
cudaMemcpy
call is (in all but a particular case) a synchronous call. The host thread running that code blocks until the memory transfer to the host. It cannot proceed to launch the kernel until the cudaMemcpy
call has returned, it that doesn't happen until the copy operation is completed.
More generally, the default stream (0 or null) implicitly serializes operations on the GPU whenever an operation is active in that stream. If you create streams and push operations into them at the same time as an operation is being performed in default stream, all concurrency in those streams is lost until the default stream is idle.
来源:https://stackoverflow.com/questions/18443205/the-behavior-of-stream-0-default-and-other-streams