Now I\'m using ROS to develop a CUDA Project. There are two nodes corresponding to 2 host threads that need to launch 2 different CUDA kernels concurrently. So I\'m wonderin