What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads?
What gets mapped to what and what is parallelized
There are multiple streaming multiprocessor on one device.
A SM may contain multiple blocks. Each block may contain several threads.
A SM have multiple CUDA cores(as a developer, you should not care about this because it is abstracted by warp), which will work on thread. SM always working on warp of threads(always 32). A warp will only working on thread from same block.
SM and block both have limits on number of thread, number of register and shared memory.