问题
The NVIDIA-SMI is throwing this error:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
I purged NVIDIA and installed it again following steps mentioned here.
My device specs are as follows:
- Server with a Tesla M40
- Running on Ubuntu 16.04
- Kernel version Linux 4.4.0-116-generic x86_64
- Driver: nvidia-384
Can someone please help in solving the error?
回答1:
Try
- Download the driver from here
sudo apt-get purge nvidia*
- To remove your current installationsdpkg -i nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb
- installing what you downloaded earliersudo apt-get update
sudo apt-get install cuda-drivers
After this, go on and reboot your computer.
When it's up again, the nvidia-smi
command should run smoothly
回答2:
The issue might due to a confirmed "bug" in 4.4.0-116 patch. I ran into the same issue with nvidia-390. If you still want to use a newer version of Nvidia-driver, I followed the instructions here and managed to solve the problem. In general, use the following steps:
- If you cannot login to the desktop and fall into to the fail-loop, press ctrl + alt + F1 to login into the command line mode.
- Check if the version of gcc is outdated, if so, update it:
gcc --version
- If the gcc version is 5+, uninstall the nvidia driver first:
sudo apt-get remove nvidia-390
- Purge the 4.4.0-116 kernel:
sudo apt-get purge linux-headers-4.4.0-116 linux-headers-4.4.0-116-generic linux-image-4.4.0-116-generic linux-image-extra-4.4.0-116-generic linux-signed-image-4.4.0-116-generic
- Reinstall the kernel:
sudo apt-get install linux-generic linux-signed-generic
- Reinstall the nvidia-390:
sudo apt-get install nvidia-390
- Check if the problem is solved by
modinfo nvidia-390 -k 4.4.0-116-generic | grep vermagic
, make sure retpoline shows up this time - Reboot:
sudo reboot
Hope this works for you and other people who run into the same issue. The post in the forum saved my weekend.
回答3:
to download latest driver as of this answer:
sudo apt install libnvidia-compute-435 libnvidia-compute-435
sudo apt install libnvidia-gl-435 nvidia-dkms-435 nvidia-kernel-source-435
nvidia-utils-435 xserver-xorg-video-nvidia-435 libnvidia-ifr1-435
sudo apt install nvidia-driver-435
sudo reboot
and then:
nvidia-smi
回答4:
If you're running this on Google Colab, just go to Runtime > Change Runtime Type > select GPU. That worked for me.
来源:https://stackoverflow.com/questions/49186723/error-nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-dri