CUDA: How to use -arch and -code and SM vs COMPUTE
I am still not sure how to properly specify the architectures for code generation when building with nvcc. I am aware that there is machine code as well as PTX code embedded in my binary and that this can be controlled via the controller switches -code and -arch (or a combination of both using -gencode ). Now, according to this apart from the two compiler flags there are also two ways of specifying architectures: sm_XX and compute_XX , where compute_XX refers to a virtual and sm_XX to a real architecture. The flag -arch only takes identifiers for virtual architectures (such as compute_XX )