From non coalesced access to coalesced memory access CUDA
问题 I was wondering if there is any simple way to transform a non-coalesced memory access into a coalesced one. Let's take the example of this array: dW[[w0,w1,w2][w3,w4,w5][w6,w7][w8,w9]] Now, i know that if Thread 0 in block 0 access dW[0] and then Thread 1 in block 0 access dw[1] , that's a coalesced access in the global memory. The problem is that i have two operations. The first one is coalesced as described above. But the second one isn't because Thread 1 in block 0 needs to do an operation