What is the purpose of using multiple “arch” flags in Nvidia's NVCC compiler?

强颜欢笑 提交于 2019-11-26 21:44:19

Roughly speaking, the code compilation flow goes like this:

CUDA C/C++ device code source --> PTX --> SASS

The virtual architecture (e.g. compute_20, whatever is specified by -arch compute...) determines what type of PTX code will be generated. The additional switches (e.g. -code sm_21) determine what type of SASS code will be generated. SASS is actually executable object code for a GPU (machine language). An executable can contain multiple versions of SASS and/or PTX, and there is a runtime loader mechanism that will pick appropriate versions based on the GPU actually being used.

As you point out, one of the handy features of GPU operation is JIT-compile. JIT-compile will be done by the GPU driver (does not require the CUDA toolkit to be installed) anytime a suitable PTX code is available but a suitable SASS code is not.

One advantage of including multiple virtual architectures (i.e. multiple versions of PTX), then, is that you have executable compatibility with a wider variety of target GPU devices (although some devices may trigger a JIT-compile to create the necessary SASS).

One advantage of including multiple "real GPU targets" (i.e. multiple SASS versions) is that you can avoid the JIT-compile step, when one of those target devices is present.

If you specify a bad set of options, it's possible to create an executable that won't run (correctly) on a particular GPU.

One possible disadvantage of specifying a lot of these options is code size bloat. Another possible disadvantage is compile time, which will generally be longer as you specify more options.

It's also possible to create excutables that contain no PTX, which may be of interest to those trying to obscure their IP.

Creating PTX suitable for JIT should be done by specifying a virtual architecture for the code switch.

The purpose of multiple -arch flags is to use the __CUDA_ARCH__ macro for conditional compilation (ie, using #ifdef) of differently-optimized code paths.

See here: http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architecture-identification-macro

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!