I\'m trying to parallelize an existing application, I have most of the application parallelized and running on the GPU, I\'m having issues migrating one function to the GPU
CUDA Toolkit 5.0 introduced a device linker that can link device object files compiled separately. I believe, CUBLAS functions from CUDA Toolkit 5.0 can now be called from device functions (but I only reviewed the headers, I have no experience using CUBLAS).