Is coalescing triggered for accessing memory in reverse order?

后端未结

关注

 2  1091

Let\'s say I have several threads and they access memory at addresses A+0, A+4, A+8, A+12 (each access = next thread). Such access is coalesced, right?

However if I

相关标签:

2条回答

[愿得一人]

2021-01-14 12:27
It's also worth noting that a main purpose of the L2 cache in an Nvidia GPU is to collapse reads and coalesce writes. So if one warp was accessing
```
thread 0 -> A+0
thread 1 -> A+8
thread 2 -> A+16
thread 3 -> A+24
...
```
and another warp was accessing
```
thread 0 -> A+4
thread 1 -> A+12
thread 2 -> A+20
thread 3 -> A+28
...
```
these two accesses will not coalesce inside the SM but generally will coalesce in the L2 cache, so that GPU memory will only be touched once.
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2021-01-14 12:32

Yes, for cc 2.0 and newer GPUs, coalescing will occur for any random arrangement of 32 bit data elements to threads, as long as all the requested 32-bit data elements are coming from (requested from) the same 128 byte (and 128 byte aligned) region in global memory.

The GPU has something like a "crossbar switch" in the memory controller that will distribute elements as needed. You may be interested in this GPU webinar which discusses coalescing and will illustrate this particular case pictorially (on slide 12).

The NVIDIA webinar page has other useful webinars you may be interested in as well.

For pre-cc2.0 devices the specifics vary by compute capability, but compute 1.0 and 1.1 capable devices do not have this ability to coalesce reads that are in "reverse order" or random order.

0 讨论(0)
发布评论:

提交评论
- 加载中...