问题
While testing a piece of CUDA containing a memory bug, my screen got frozen. After rebooting I cannot detect anymore the graphics card. Is it possible that my code physically damaged the card?
This happened under Ubuntu 14.04. I don't know the model of the card, as I cannot detect it but I remember it is a fairly new one.
回答1:
Thanks to all the comments I solved the problem.
I will list the actions that I undertook. I'm not sure if all of them had an effect but eventually the problem got solved.
First I disconnected the graphics card and rebooted without it. Afterwards I plugged the card again and rebooted. I was thrown to a menu where it was said that I was running in low graphics mode. I open a tty
(ctrl+alt+1) and tried to re-install the Nvidia drivers using the instructions here.
It initially failed because the nouveau drivers were running (which I think is the main culprit of the whole problem).
I blacklisted the drivers following this link.
In summary create the file /etc/modprobe.d/blacklist-nouveau.conf
and add:
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
Then I rebooted. By then my screen started to work properly but I couldn't start the ubuntu
desktop. I reinstalled the cuda drivers (there were a few errors but not fatal errors).
Then I rebooted and my screen was working again.
Answering the main question: I did not damage the graphics card by testing CUDA
code.
回答2:
I had the same issue with very GPU intensive code and the culprit was that the GPU was not properly cooled ; after the manufacturer replaced the m2090 with C2075 (nearly the same GPU but with active cooling,) there was not problem anymore. Before that, we replaced the MOBO, and the GPU, with no amelioration.
The GPU was not damaged, it simply entered a protection mode, and worked normally again as soon as it cooled down.
来源:https://stackoverflow.com/questions/26060762/can-cuda-code-damage-a-gpu