GPU Shared Memory Bank Conflict

后端未结

关注

 2  813

一向 2021-02-04 12:46

I am trying to understand how bank conflicts take place.
if i have an array of size 256 in global memory and i have 256 threads in a single Block, and i want to copy the arr

2条回答

暖寄归人 (楼主)

2021-02-04 12:53
In both cases threads access shared memory with consecutive address. It depends on the element size of shared memory, but consecutive access to shared memory by a warp of threads does not result in a bank conflict for "small" element sizes.

Profiling this code with NVIDIA Visual Profiler shows that for element size smaller than 32 and a multiple of 4 (4, 8, 12, ... , 28), consecutive access to the shared memory does not result in a bank conflict. Element size of 32, however, results in bank conflict.

Answer by Ljdawson contains some outdated information:

... If you use a 1 byte word such as char, in the first half warp threads 0, 1, 2 and 3 will all save their values to the first bank of shared memory which will cause a bank conflict.

This may be true for old GPUs, but for recent GPUs with cc >= 2.x, they don't cause bank conflicts, effectively due to the broadcast mechanism(link). Following quote is from CUDA C PROGRAMMING GUIDE (v8.0.61) G3.3. Shared Memory.
A shared memory request for a warp does not generate a bank conflict between two threads that access any address within the same 32-bit word (even though the two addresses fall in the same bank): In that case, for read accesses, the word is broadcast to the requesting threads (multiple words can be broadcast in a single transaction) and for write accesses, each address is written by only one of the threads (which thread performs the write is undefined).

This means, in particular, that there are no bank conflicts if an array of char is accessed as follows, for example:
```
   extern __shared__ char shared[];
   char data = shared[BaseIndex + tid];
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...