GPU Shared Memory Bank Conflict

后端 未结 2 812
一向
一向 2021-02-04 12:46

I am trying to understand how bank conflicts take place.
if i have an array of size 256 in global memory and i have 256 threads in a single Block, and i want to copy the arr

2条回答
  •  -上瘾入骨i
    2021-02-04 12:55

    The best way to check this would be to profile your code using the "Compute Visual Profiler"; this comes with the CUDA Toolkit. Also there's a great section in GPU Gems 3 on this - "39.2.3 Avoiding Bank Conflicts".

    "When multiple threads in the same warp access the same bank, a bank conflict occurs unless all threads of the warp access the same address within the same 32-bit word" - First thing there are 16 memory banks each 4bytes wide. So essentially, if you have any thread in a half warp reading memory from the same 4bytes in a shared memory bank, you're going to have bank conflicts and serialization etc.

    OK so your first example:

    First lets assume your arrays are say for example of the type int (a 32-bit word). Your code saves these ints into shared memory, across any half warp the Kth thread is saving to the Kth memory bank. So for example thread 0 of the first half warp will save to shared_a[0] which is in the first memory bank, thread 1 will save to shared_a[1], each half warp has 16 threads these map to the 16 4byte banks. In the next half warp, the first thread will now save its value into shared_a[16] which is in the first memory bank again. So if you use a 4byte word such int, float etc then your first example will not result in a bank conflict. If you use a 1 byte word such as char, in the first half warp threads 0, 1, 2 and 3 will all save their values to the first bank of shared memory which will cause a bank conflict.

    Second example:

    Again this will all depend on the size of the word you are using, but for the example I'll use a 4byte word. So looking at the first half warp:

    Number of threads = 32

    N = 64

    Thread 0: Will write to 0, 31, 63 Thread 1: Will write to 1, 32

    All threads across the half warp execute concurrently so the writes to shared memory shouldn't cause bank conflicts. I'll have to double check this one though.

    Hope this helps, sorry for the huge reply!

提交回复
热议问题