Cuda optimization techniques

二次信任 提交于 2020-01-24 22:14:28

问题


I have written a CUDA code to solve an NP-Complete problem, but the performance was not as I suspected.

I know about "some" optimization techniques (using shared memroy, textures, zerocopy...)

What are the most important optimization techniques CUDA programmers should know about?


回答1:


You should read NVIDIA's CUDA Programming Best Practices guide: http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide.pdf

This has multiple different performance tips with associated "priorities". Here are some of the top priority tips:

  1. Use the effective bandwidth of your device to work out what the upper bound on performance ought to be for your kernel
  2. Minimize memory transfers between host and device - even if that means doing calculations on the device which are not efficient there
  3. Coalesce all memory accesses
  4. Prefer shared memory access to global memory access
  5. Avoid code execution branching within a single warp as this serializes the threads



回答2:


The new NVIDIA Visual Profiler (v4.1) supports automated performance analysis to identify performance improvement opportunities in your application. It also links directly to the most useful sections of the Best Practices Guide for the issues it detects. And the Visual Profiler is available for free as part of the CUDA Toolkit on NVIDIA's developer web site: http://www.nvidia.com/getcuda.



来源:https://stackoverflow.com/questions/3090493/cuda-optimization-techniques

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!