As title, I would like to know the right execution order in case we have a 3d block
I think to remember that I read already something regarding it, but it was some t
Yes, that is the correct ordering; threads are ordered with the x dimension varying first, then y, then z (equivalent to column-major order) within a block. The calculation can be expressed as
int threadID = threadIdx.x +
blockDim.x * threadIdx.y +
(blockDim.x * blockDim.y) * threadIdx.z;
int warpID = threadID / warpSize;
int laneID = threadID % warpsize;
Here threadID
is the thread number within the block, warpID
is the warp within the block and laneID
is the thread number within the warp.
Note that threads are not necessarily executed in any sort of predicable order related to this ordering within a block. The execution model guarantees that threads in the same warp are executed "lock-step", but you can't infer any more than that from the thread numbering within a block.