发表新帖

发表新帖

How to understand “All threads in a warp execute the same instruction at the same time.” in GPU?

前端未结

关注

 1  1801

天命终不由人 2021-01-07 00:04

I am reading Professional CUDA C Programming, and in GPU Architecture Overview section:

CUDA employs a Single Instruction Multiple Thread (SIMT)

1条回答

花落未央 (楼主)

2021-01-07 00:36
There is no contradiction. All threads in a warp execute the same instruction in lock-step at all times. To support conditional execution and branching CUDA introduces two concepts in the SIMT model
1. Predicated execution (See here)
2. Instruction replay/serialisation (See here)
Predicated execution means that the result of a conditional instruction can be used to mask off threads from executing a subsequent instruction without a branch. Instruction replay is how a classic conditional branch is dealt with. All threads execute all branches of the conditionally executed code by replaying instructions. Threads which do not follow a particular execution path are masked off and execute the equivalent of a NOP. This is the so-called branch divergence penalty in CUDA, because it has a significant impact on performance.

This is how lock-step execution can support branching.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题