CUDA thread execution order

后端 未结 2 1540
野的像风
野的像风 2021-01-23 06:49

I have the following code for a CUDA program:

#include 

#define NUM_BLOCKS 4
#define THREADS_PER_BLOCK 4

__global__ void hello()
{  

   printf(         


        
相关标签:
2条回答
  • However, the thread order in every block is always 0,1,2,3. Why is this happening? I thought it would be random too

    With 4 threads per block you are only launching one warp per block. A warp is the unit of execution (and scheduling, and resource assignment) in CUDA, not a thread. Currently, a warp consists of 32 threads.

    This means that all 4 of your threads per block (since there is no conditional behavior in this case) are executing in lockstep. When they reach the printf function call, they all execute the call to that function in the same line of code, in lockstep.

    So the question becomes, in this situation, how does the CUDA runtime dispatch these "simultaneous" function calls? The answer to that question is unspecified, but it is not "random". Therefore it's reasonable that the order of dispatch for operations within a warp does not change from run to run.

    If you launch enough threads to create multiple warps per block, and probably also include some other code to disperse and or "randomize" the behavior between warps, you should be able to see printf operations emanating from separate warps occurring in "random" order.

    0 讨论(0)
  • 2021-01-23 07:41

    To answer the second part of your question, when control flow diverges at the if statement, the threads where threadIdx.x != 0 simply wait to at the convergence point after the if statement. They do not go on to the printf statement until thread 0 has completed the if block.

    0 讨论(0)
提交回复
热议问题