NVIDIA NVML Driver/library version mismatch

后端 未结 19 1833
无人及你
无人及你 2021-01-29 17:15

When I run nvidia-smi I get the following message:

Failed to initialize NVML: Driver/library version mismatch

An hour ago I receiv

相关标签:
19条回答
  • 2021-01-29 17:58

    Mostly reboot would fix the issue on Ubuntu 18.04.

    The “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release that is incompatible with the CUDA toolkit version currently in use. Rebooting the compute nodes will generally resolve this issue.

    0 讨论(0)
  • 2021-01-29 17:58

    For my case, I have installed nvidia driver and then cuda. I found it can be fixed by just install cuda. https://developer.nvidia.com/cuda-toolkit

    0 讨论(0)
  • 2021-01-29 18:00

    reboot. If the problem still exist:

    sudo rmmod nvidia_drm
    sudo rmmod nvidia_modeset
    sudo rmmod nvidia
    nvidia-smi
    

    for cent/rhel

    cd /boot
    mv initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
    dracut -vf initramfs-$(uname -r).img $(uname -r)
    

    then

    reboot
    

    for debian/ubuntu

    update-initramfs -u
    

    if problem exist persist

    apt install -y dkms && dkms install -m nvidia -v 440.82
    

    Change 440.82 to your actual version.

    tip: get the Nvidia driver version:

    ls /usr/src
    

    you will find the Nvidia driver dir such as nvidia-440.82


    also you can remove all Nvidia pkg and reinstall driver again

    apt purge nvidia*
    apt purge *cuda*
    
    #check
    apt list -i |grep nvidia
    apt list -i |grep cuda
    
    0 讨论(0)
  • 2021-01-29 18:01

    The top-2 answers can't solve my problem. I found a solution at the Nvidia official forum solved my problem. The below error info may cause by installing two different versions of the driver by different approaches. For example, install Nvidia driver by the apt and the official installer.

    Failed to initialize NVML: Driver/library version mismatch

    To solve this problem, only need to execute one of the following two commands.

    sudo apt-get --purge remove "*nvidia*"
    
    sudo /usr/bin/nvidia-uninstall
    
    0 讨论(0)
  • 2021-01-29 18:04

    I experienced this problem after a normal kernel update on a CentOS machine. Since all CUDA and nVidia drivers and libraries have been installed via YUM repositories, I managed to solve the issues using the following steps:

    sudo yum remove nvidia-driver-*
    sudo reboot
    sudo yum install nvidia-driver-cuda nvidia-modprobe
    sudo modprobe nvidia # or just reboot
    

    It made sure my kernel and my nVidia driver are consistent. I reckon that just rebooting may result in wrong version of kernel module being loaded.

    0 讨论(0)
  • 2021-01-29 18:08

    Surprise surprise, rebooting solved the issue (I thought I had already tried that).

    The solution Robert Crovella mentioned in the comments may also be useful to someone else, since it's pretty similar to what I did to solve the issue the first time I had it.

    0 讨论(0)
提交回复
热议问题