I am working with CUDA on the windows platform. On the windows platform we have access to both Parallel Nsight and Visual Profiler. Both are pretty good but then they have
Parallel Nsight has the benefit of being built right into Visual Studio and features a natural workflow for Windows developers.
In Parallel Nsight 2.2, whenever the target is set to "localhost", the Monitor is started automatically. This is true for both Analysis and CUDA profiling as well as CUDA debugging.
The Monitor takes a short time to start up (roughly the same time it takes to start your favorite web browser), but it is one time. Until the Monitor is terminated or the machine restarted, there is no need to start the Monitor again.