Practice computing grid size for CUDA

♀尐吖头ヾ 提交于 2020-01-22 00:42:49

问题


dim3 block(4, 2)
dim3 grid((nx+block.x-1)/block.x, (ny.block.y-1)/block.y);

I found this code in Professional CUDA C Programming on page 53. It's meant to be a naive example of matrix multiplication. nx is the number of columns and ny is the number of rows.

Can you explain how the grid size is computed? Why is block.x added to nx and then subtracted by 1?

There is a preview (https://books.google.com/books?id=_Z7rnAEACAAJ&printsec=frontcover#v=onepage&q&f=false) but page 53 is missing.


回答1:


This is the standard CUDA idiom for determining the minimum number of blocks in each dimension (the "grid") that completely cover the desired input. This could be expressed as ceil(nx/block.x), that is, figure out how many blocks are needed to cover the desired size, then round up.

But full floating point division and ceil is more expensive than necessary. Instead, since C defines integer division as a "floor" operation, you can add the divisor - 1 before dividing to the get the effect of a "ceiling" operation.

Try a few examples: If nx = 10, then nx + block.x - 1 is 13, and by integer divison, you need 3 blocks of size 4.

As you noted in the comment, +block.x pushes up floor to ceiling and the -1 is for numbers that divide perfectly into the divisor. e.g. (12 + 4)/4 would be 4 when we actually want (12+4-1)/4 which 3



来源:https://stackoverflow.com/questions/30584311/practice-computing-grid-size-for-cuda

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!