If you want to see the kernels overlap with kernels (concurrent kernels) you need to make use of CUDA Visual profiler 5.0 that comes with CUDA 5.0 Toolkit. I don't think previous profilers are capable of this. It should also show kernel and memcpy overlap.