Changing the arch argument in CUDA makes me use more registers
问题 I have been writing a kernel on my Tesla K20m, when I compile the software with -Xptas=-v I obtain the following results : ptxas info : 0 bytes gmem ptxas info : Compiling entry function '_Z9searchKMPPciPhiPiS1_' for 'sm_10' ptxas info : Used 8 registers, 80 bytes smem, 8 bytes cmem[1] as you can see, only 8 registers are used, however, if I mention the argument -arch=sm_35 the time my kernel executes raises dramatically and the number of registers used too, and I am wondering why nvcc