I wanted to get a feel for Kepler's architecture, but it doesn't make sense to me.
If a warp is 32 threads, and 4 of them get scheduled/executed, that would mean 128 cores are in use and 64 are left idle. In the whitepaper it said something about independent instructions, so are the 64 cores reserved for those instructions?
If so, can someone give me an example of when an independent instruction would be needed?
Each SM in Kepler has 192 (SP) cores, and 4 warp schedulers. Each warp scheduler is capable of dual-issue which means that it can actually issue 2 instructions from a given threadblock (actually for a particular warp) in a single issue slot, under some circumstances.
One of these circumstances is that the instructions should be independent, which roughly speaking means that niether instruction depends on the output of the other instruction.
With 4 warp schedulers, each capable of possibly dual-issue, it's theoretically possible to launch work for up to 8 warp instructions. This is at least theoretically enough to keep 192 (SP) cores busy.
An SM has execution units besides the SP units that are commonly referred to as "cores", so the actual instruction mix will determine which execution units are scheduled in any given issue slot.
You can get a more detailed description in the GK110 whitepaper.
来源:https://stackoverflow.com/questions/26081418/why-does-the-gk110-have-192-cores-and-4-warps