CUDA: Thread ID assignment in 2D grid

柔情痞子 提交于 2019-12-18 09:27:43

问题


Let's suppose I have a kernel call with a 2D grid, like so:

dim3 dimGrid(x, y); // not important what the actual values are
dim3 dimBlock(blockSize, blockSize);
myKernel <<< dimGrid, dimBlock >>>();

Now I've read that multidimensional grids are merely meant to ease programming - the underlying hardware will only ever use 1D linearly cached memory (unless you use texture memory, but that's not relevant here).

My question is: In what order will the threads be assigned to the grid indices during warp scheduling? Will they be assigned horizontally ("iterate" x, then y) or vertically ("iterate" y, then x)? This might be relevant to improve memory coalescing, depending on how I access my memory in the kernel.

To make it more clear, let's say the following represents the thread's IDs as applied to my (imaginary) grid with a "horizontal" distribution:

[ 0  1  2  3 ]
[ 4  5  6  7 ]
[ 8  9 10 11 ]
[ ...        ]

And "vertical" distribution would be:

[ 0  4  8 .. ]
[ 1  5  9 .. ]
[ 2  6 10 .. ]
[ 3  7 11 .. ]

I hope you can see how this might affect coalescing: With each variant, there will be a specific optimal way to access my device memory buffer.

Unfortunately, I have not found any detailed information on this yet..


回答1:


Horizontal and vertical is arbitrary. But threads do have a well-defined x, y, and z dimension. Threads are grouped into warps in the order of x, y, z. So a 16x16 threadblock will have threads in the following order in the first 32-thread warp:

warp lane: thread ID (x,y,z)

  • 0: 0,0,0
  • 1: 1,0,0
  • 2: 2,0,0
  • 3: 3,0,0
  • ...
  • 15: 15,0,0
  • 16: 0,1,0
  • 17: 1,1,0
  • 18: 2,1,0
  • 19: 3,1,0
  • ...
  • 31: 15,1,0


来源:https://stackoverflow.com/questions/14257550/cuda-thread-id-assignment-in-2d-grid

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!