When I run nvidia-smi
I get the following message:
Failed to initialize NVML: Driver/library version mismatch
An hour ago I receiv
As @etal said, rebooting can solve this problem, but I think a procedure without rebooting will help.
For Chinese, check my blog -> 中文版
The error message
NVML: Driver/library version mismatch
tell us the Nvidia driver kernel module (kmod) have a wrong version, so we should unload this driver, and then load the correct version of kmod
First, we should know which drivers are loaded.
lsmod | grep nvidia
you may get
nvidia_uvm 634880 8
nvidia_drm 53248 0
nvidia_modeset 790528 1 nvidia_drm
nvidia 12312576 86 nvidia_modeset,nvidia_uvm
our final goal is to unload nvidia
mod, so we should unload the module depend on nvidia
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm
then, unload nvidia
sudo rmmod nvidia
if you get an error like rmmod: ERROR: Module nvidia is in use
, which indicates that the kernel module is in use, you should kill the process that using the kmod:
sudo lsof /dev/nvidia*
and then kill those process, then continue to unload the kmods
confirm you successfully unload those kmods
lsmod | grep nvidia
you should get nothing, then confirm you can load the correct driver
nvidia-smi
you should get the correct output
I committed the container into a docker image. Then I recreate another container using this docker image and the problem was gone.
If you've recently updated, a reboot might solve this problem.
sudo reboot
Rebooting solved it for me.
First I installed the Nvidia driver.
Next I installed cuda.
Ater that I got the "Driver/library version mismatch" ERROR but I could see the cuda version so I purged the Nvidia driver and reinstall it.
Then it worked correctly.
So I was having this problem, none of the other remedies worked. The error message was opaque, but checking dmesg was key:
[ 10.118255] NVRM: API mismatch: the client has the version 410.79, but
NVRM: this kernel module has the version 384.130. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
However I had completely removed the 384 version, and removed any remaining kernel drivers nvidia-384*
. But even after reboot, I was still getting this. Seeing this meant that the kernel was still compiled to reference 384, but was only finding 410. So I recompiled my kernel:
# uname -a # find the kernel it's using
Linux blah 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# update-initramfs -c -k 4.13.0-43-generic #recompile it
# reboot
And then it worked.
After removing 384, I still had 384 files in: /var/lib/dkms/nvidia-XXX/XXX.YY/4.13.0-43-generic/x86_64/module /lib/modules/4.13.0-43-generic/kernel/drivers
I recommend using the locate
command (not installed by default) rather than searching the filesystem every time.