问题
I'm pretty new to CUDA and flying a bit by the seat of my pants here...
I'm trying to debug my CUDA program on a remote machine I don't have admin rights on. I compile my program with nvcc -g -G
and then try to debug it with cuda-gdb. However, as soon as gdb hits a call to a kernel (doesn't even have to enter it, and it doesn't happen in host code), I get:
(cuda-gdb) run
Starting program: /path/to/my/binary/cuda_clustered_tree
[Thread debugging using libthread_db enabled]
[1]+ Stopped cuda-gdb cuda_clustered_tree
cuda-gdb then dumps me back to my terminal. If I try to run cuda-gdb again, I get
An instance of cuda-gdb (pid 4065) is already using device 0. If you believe
you are seeing this message in error, try deleting /tmp/cuda-dbg/cuda-gdb.lock.
The only way to recover is to kill -9
cuda-gdb and cuda_clustered_
(I assume the latter is part of my binary).
This machine has two GPUs, is running CUDA 4.1 (I believe -- there were a lot installed, but that's the one I set the PATH
and LD_LIBRARY_PATH
to) and compile + runs deviceQuery and bandwidthTest fine.
I can provide more info if need be. I've searched everywhere I could find online and found no help with this.
回答1:
Figured it out! Turns out, cuda-gdb hates csh.
If you are running csh, it will cause cuda-gdb to exhibit the above anomalous behavior. Even running bash from within csh, then running cuda-gdb, I still saw the behavior. You need to start your shell as bash, and only bash.
On the machine, the default shell was csh, but I use bash. I wasn't allowed to change it directly, so I added 'exec /bin/bash --login' to my .login script.
So even though I was running bash, because it was started by csh, cuda-gdb would exhibit the above anomalous behavior. Getting rid of 'exec' command, so I was running csh directly with nothing on top, still showed the behavior.
In the end, I had to get IT to change my shell to bash directly (after much patient troubleshooting by them.) Now it works as intended.
来源:https://stackoverflow.com/questions/10472184/cuda-gdb-exits-with-1-stopped-when-it-hits-a-kernel-call