I have basically the same question as posed in this discussion. In particular I want to refer to this final response:
I think there are two different que
There is definitely a benefit in the use of multi-dimensional grid. The different entries (tid, ctaid) are read-only variables visible as special registers. See PTX ISA
PTX includes a number of predefined, read-only variables, which are visible as special registers and accessed through mov or cvt instructions. The special registers are:
%tid %ntid %laneid %warpid %nwarpid %ctaid %nctaid
If some of this data may be used without further processing, not-only you may gain arithmetic instructions - potentially at each indexing step of multi-dimension data, but more importantly you are saving registers which is a very scarce resource on any hardware.