I\'m writing an image processing app where I have to fetch pixel data in uncoalesced manner.
Initially I implemented my algorithm using global memory. Later I reimpleme
Textures can indeed be useful on devices of compute capability >= 2.0.
Textures and cudaArrays can use memory stored in a space filling curve, which can allow for a better cache hit rate due to better 2D spatial locality.
The texture cache is separate from the other caches. So it has its own dedicated memory and bandwidth and reading from it does not interfere with the other caches. This can become important if there is a lot of pressure on your L1/L2 caches.
Textures also provide built in functionality such as interpolation, various addressing modes (clamp, wrap, mirror) and normalized addressing with floating point coordinates. These can be used without any extra cost and can greatly improve performance in kernels where such functionality is needed.
On early CUDA architectures, textures and cudaArrays could not be written by a kernel. On architectures of compute capability >= 2.0, they can be written via CUDA surfaces.
Determining if you should use textures or a regular buffer in global memory comes down to the intended usage and access patterns for the memory. It will be project specific.
You are using the Fermi architecture, with a device that has been rebranded into the 6xx series.
For those on the Kepler architecture, take a look at NVIDIA's Inside Kepler Presentation. In particular, the slides, Texture Performance
, Texture Cache Unlocked
and const __restrict Example
.