The behavior of __CUDA_ARCH__ macro

ぃ、小莉子 提交于 2021-01-27 14:07:10

问题


In the host code, it seems that the __CUDA_ARCH__ macro wont generate different code path, instead, it will generate code for exact the code path for the current device.

However, if __CUDA_ARCH__ were within device code, it will generate different code path for different devices specified in compiliation options (/arch).

Can anyone confirm this is correct?


回答1:


__CUDA_ARCH__ when used in device code will carry a number defined to it that reflects the code architecture currently being compiled.

It is not intended to be used in host code. From the nvcc manual:

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

Usage of __CUDA_ARCH__ in host code is therefore undefined (at least by CUDA). As pointed out by @tera in the comments, since the macro is undefined in host code, it could be used to differentiate host/device paths for example, in a __host__ __device__ function definition.

#ifndef __CUDA_ARCH__
//host code here
#else
//device code here
#endif


来源:https://stackoverflow.com/questions/16072740/the-behavior-of-cuda-arch-macro

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!