When I run nvidia-smi
I get the following message:
Failed to initialize NVML: Driver/library version mismatch
An hour ago I receiv
Mostly reboot would fix the issue on Ubuntu 18.04.
The “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release that is incompatible with the CUDA toolkit version currently in use. Rebooting the compute nodes will generally resolve this issue.
For my case, I have installed nvidia driver and then cuda. I found it can be fixed by just install cuda. https://developer.nvidia.com/cuda-toolkit
reboot. If the problem still exist:
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
nvidia-smi
for cent/rhel
cd /boot
mv initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut -vf initramfs-$(uname -r).img $(uname -r)
then
reboot
for debian/ubuntu
update-initramfs -u
if problem exist persist
apt install -y dkms && dkms install -m nvidia -v 440.82
Change 440.82 to your actual version.
tip: get the Nvidia driver version:
ls /usr/src
you will find the Nvidia driver dir such as nvidia-440.82
also you can remove all Nvidia pkg and reinstall driver again
apt purge nvidia*
apt purge *cuda*
#check
apt list -i |grep nvidia
apt list -i |grep cuda
The top-2 answers can't solve my problem. I found a solution at the Nvidia official forum solved my problem. The below error info may cause by installing two different versions of the driver by different approaches. For example, install Nvidia driver by the apt and the official installer.
Failed to initialize NVML: Driver/library version mismatch
To solve this problem, only need to execute one of the following two commands.
sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
I experienced this problem after a normal kernel update on a CentOS machine. Since all CUDA and nVidia drivers and libraries have been installed via YUM repositories, I managed to solve the issues using the following steps:
sudo yum remove nvidia-driver-*
sudo reboot
sudo yum install nvidia-driver-cuda nvidia-modprobe
sudo modprobe nvidia # or just reboot
It made sure my kernel and my nVidia driver are consistent. I reckon that just rebooting may result in wrong version of kernel module being loaded.
Surprise surprise, rebooting solved the issue (I thought I had already tried that).
The solution Robert Crovella mentioned in the comments may also be useful to someone else, since it's pretty similar to what I did to solve the issue the first time I had it.