What is happening with this CUDA code that returns this unexpected output?

后端 未结 1 590
感动是毒
感动是毒 2021-01-26 12:32

Having finally gotten Dynamic Parallelism up and running, I\'m trying to now implement my model with it. It took me a while to figure out that some strange output resulted from

1条回答
  •  北海茫月
    2021-01-26 13:13

    You probably don't need to synchronize with the parent kernel. Child kernels execute in the order specified by parent kernel and the end of parent kernel is an implicit synchronization point with the last child kernel.

    When you use dynamic parallelism, be careful about these items:

    1. The deepest you can go is 24 (CC=3.5).

    2. The number of dynamic kernels pending for launch at the same time is limited ( default 2048 at CC=3.5) but can be increased.

    3. Keep parent kernel busy after child kernel call otherwise with a good chance you waste resources.

    I guess your strange wrong results originate from the second factor mentioned above. When you hit the limit, some of dynamic kernels simply don't run and if you don't check for errors, you won't notice because error creation mechanism is per thread.

    You can increase this limit by cudaDeviceSetLimit() having cudaLimitDevRuntimePendingLaunchCount as the limit. But the more you specify, the more you consume global memory space. Have a look at section C.4.3.1.3 of the documentation here.

    0 讨论(0)
提交回复
热议问题