Understanding Streaming Multiprocessors (SM) and Streaming Processors (SP)

前端 未结 1 1584
一个人的身影
一个人的身影 2021-02-06 16:24

I am trying to understand the basic architecture of a GPU. I have gone through a lot of material including this very good SO answer. But I am still confused not able to get a go

相关标签:
1条回答
  • 2021-02-06 17:28

    First, some comments on the "My understanding" portion of the question:

    • The number of SMs depends on GPU model - there are low-end models with just one SM, and high-end ones with as many as 30! Compute capability defines what those SMs are capable of, but not how many SMs there are in a GPU.
    • Each thread block is assigned to an SM, not SP. There can be multiple thread blocks running on a given SM, subject to its resource limitations.

    On to the diagram:

    • Orange boxes are indeed SMs, just as they are labeled. Each SM has shared memory pool, divided between all thread blocks running on this SM.
    • Blue boxes are SPs. Since SP is a scalar lane, it runs one thread, and each thread is provided with its own set of registers, again, just like the diagram shows.

    Addressing the follow-up question:

    • Each SM can have multiple resident thread blocks. The maximum number of thread blocks resident on SM is determined by compute capability. Achieved number can be lower than maximum when it is limited by the number of registers or the amount of shared memory consumed by each thread block.
    • SM will then schedule instruction from all warps resident on it, picking among warps that have instructions ready for execution - and those warps may come from any thread block resident on this SM. You generally want to have many warps resident, so that at any given moment of time SPs can be kept busy running instructions from whatever warps are ready.
    • Number of cores per SM is not a very useful metric, and you need not think too much about it at this point.
    0 讨论(0)
提交回复
热议问题