Error: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

℡╲_俬逩灬. 提交于 2020-08-06 10:43:10

问题


The NVIDIA-SMI is throwing this error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

I purged NVIDIA and installed it again following steps mentioned here.

My device specs are as follows:

  • Server with a Tesla M40
  • Running on Ubuntu 16.04
  • Kernel version Linux 4.4.0-116-generic x86_64
  • Driver: nvidia-384

Can someone please help in solving the error?


回答1:


Try

  1. Download the driver from here
  2. sudo apt-get purge nvidia* - To remove your current installations
  3. dpkg -i nvidia-diag-driver-local-repo-ubuntu1604_375.66-1_amd64.deb - installing what you downloaded earlier
  4. sudo apt-get update
  5. sudo apt-get install cuda-drivers

After this, go on and reboot your computer. When it's up again, the nvidia-smi command should run smoothly




回答2:


The issue might due to a confirmed "bug" in 4.4.0-116 patch. I ran into the same issue with nvidia-390. If you still want to use a newer version of Nvidia-driver, I followed the instructions here and managed to solve the problem. In general, use the following steps:

  1. If you cannot login to the desktop and fall into to the fail-loop, press ctrl + alt + F1 to login into the command line mode.
  2. Check if the version of gcc is outdated, if so, update it: gcc --version
  3. If the gcc version is 5+, uninstall the nvidia driver first: sudo apt-get remove nvidia-390
  4. Purge the 4.4.0-116 kernel: sudo apt-get purge linux-headers-4.4.0-116 linux-headers-4.4.0-116-generic linux-image-4.4.0-116-generic linux-image-extra-4.4.0-116-generic linux-signed-image-4.4.0-116-generic
  5. Reinstall the kernel: sudo apt-get install linux-generic linux-signed-generic
  6. Reinstall the nvidia-390: sudo apt-get install nvidia-390
  7. Check if the problem is solved by modinfo nvidia-390 -k 4.4.0-116-generic | grep vermagic, make sure retpoline shows up this time
  8. Reboot: sudo reboot

Hope this works for you and other people who run into the same issue. The post in the forum saved my weekend.




回答3:


to download latest driver as of this answer:

    sudo apt install libnvidia-compute-435 libnvidia-compute-435
    sudo apt install libnvidia-gl-435 nvidia-dkms-435 nvidia-kernel-source-435         
    nvidia-utils-435 xserver-xorg-video-nvidia-435 libnvidia-ifr1-435 
    sudo apt install nvidia-driver-435
    sudo reboot

and then:

    nvidia-smi



回答4:


If you're running this on Google Colab, just go to Runtime > Change Runtime Type > select GPU. That worked for me.



来源:https://stackoverflow.com/questions/49186723/error-nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-dri

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!