How is 2D Shared Memory arranged in CUDA

后端 未结 1 534
南笙
南笙 2020-12-13 07:59

I\'ve always worked with linear shared memory (load, store, access neighbours) but I\'ve made a simple test in 2D to study bank conflicts which results have confused me.

相关标签:
1条回答
  • 2020-12-13 08:14

    Yes, shared memory is arranged in row-major order as you expected. So your [16][16] array is stored row wise, something like this:

           bank0 .... bank15
    row 0  [ 0   .... 15  ]
        1  [ 16  .... 31  ]
        2  [ 32  .... 47  ]
        3  [ 48  .... 63  ]
        4  [ 64  .... 79  ]
        5  [ 80  .... 95  ]
        6  [ 96  .... 111 ]
        7  [ 112 .... 127 ]
        8  [ 128 .... 143 ]
        9  [ 144 .... 159 ]
        10 [ 160 .... 175 ]
        11 [ 176 .... 191 ]
        12 [ 192 .... 207 ]
        13 [ 208 .... 223 ]
        14 [ 224 .... 239 ]
        15 [ 240 .... 255 ]
           col 0 .... col 15
    

    Because there are 16 32 bit shared memory banks on pre-Fermi hardware, every integer entry in each column maps onto one shared memory bank. So how does that interact with your choice of indexing scheme?

    The thing to keep in mind is that threads within a block are numbered in the equivalent of column major order (technically the x dimension of the structure is the fastest varying, followed by y, followed by z). So when you use this indexing scheme:

    shData[threadIdx.x][threadIdx.y]
    

    threads within a half-warp will be reading from the same column, which implies reading from the same shared memory bank, and bank conflicts will occur. When you use the opposite scheme:

    shData[threadIdx.y][threadIdx.x]
    

    threads within the same half-warp will be reading from the same row, which implies reading from each of the 16 different shared memory banks, no conflicts occur.

    0 讨论(0)
提交回复
热议问题