I\'m just a beginner with CUDA and Nsight and want to utilize great GPU performance with linear algebra operations (e.g. CUBLAS). I\'ve got a lots of custom code written with th
The CUDA front end parser for C++ code is not capable of parsing extremely complex host template definitions correctly in all situations. It's job is to look through code in a .cu
file and try and split out code which must be compiled by the GPU toolchain from code which should pass through to the host compiler. It is known to fail when importing Boost and QT headers into .cu
files. I'll wager the Eigen templates are causing the same problem.
The only solution I am aware of is to refactor your code to separate out the host code that relies on the templates to a different file with a .cc
extension. The CUDA front end never sees any code in a .cc
file and the problem disappears. In practice, this sort of code splitting isn't really a problem because the host template code can't actually be used inside CUDA GPU code anyway, and at worst you might require a small wrapper function or additional level of abstraction to keep your GPU and host code separate.