I have some problems running my code on a GTX 480 with Compute Capability 2.0
I always get following error if I launch the kernel with 1024 threads per Block:
Have you tried upgrading the driver of the GPU? For me the program just ran until I got unlucky, with the exact same problem. No warnings about minimal driver versions whatsoever.