How to understand “All threads in a warp execute the same instruction at the same time.” in GPU?

前端 未结 1 1801
天命终不由人
天命终不由人 2021-01-07 00:04

I am reading Professional CUDA C Programming, and in GPU Architecture Overview section:

CUDA employs a Single Instruction Multiple Thread (SIMT)

1条回答
  •  花落未央
    2021-01-07 00:36

    There is no contradiction. All threads in a warp execute the same instruction in lock-step at all times. To support conditional execution and branching CUDA introduces two concepts in the SIMT model

    1. Predicated execution (See here)
    2. Instruction replay/serialisation (See here)

    Predicated execution means that the result of a conditional instruction can be used to mask off threads from executing a subsequent instruction without a branch. Instruction replay is how a classic conditional branch is dealt with. All threads execute all branches of the conditionally executed code by replaying instructions. Threads which do not follow a particular execution path are masked off and execute the equivalent of a NOP. This is the so-called branch divergence penalty in CUDA, because it has a significant impact on performance.

    This is how lock-step execution can support branching.

    0 讨论(0)
提交回复
热议问题