NVIDIA NVML Driver/library version mismatch

后端 未结 19 1832
无人及你
无人及你 2021-01-29 17:15

When I run nvidia-smi I get the following message:

Failed to initialize NVML: Driver/library version mismatch

An hour ago I receiv

相关标签:
19条回答
  • 2021-01-29 17:52

    As @etal said, rebooting can solve this problem, but I think a procedure without rebooting will help.

    For Chinese, check my blog -> 中文版

    The error message

    NVML: Driver/library version mismatch

    tell us the Nvidia driver kernel module (kmod) have a wrong version, so we should unload this driver, and then load the correct version of kmod

    How to do that ?

    First, we should know which drivers are loaded.

    lsmod | grep nvidia

    you may get

    nvidia_uvm            634880  8
    nvidia_drm             53248  0
    nvidia_modeset        790528  1 nvidia_drm
    nvidia              12312576  86 nvidia_modeset,nvidia_uvm
    

    our final goal is to unload nvidia mod, so we should unload the module depend on nvidia

    sudo rmmod nvidia_drm
    sudo rmmod nvidia_modeset
    sudo rmmod nvidia_uvm

    then, unload nvidia

    sudo rmmod nvidia

    Troubleshooting

    if you get an error like rmmod: ERROR: Module nvidia is in use, which indicates that the kernel module is in use, you should kill the process that using the kmod:

    sudo lsof /dev/nvidia*

    and then kill those process, then continue to unload the kmods

    Test

    confirm you successfully unload those kmods

    lsmod | grep nvidia

    you should get nothing, then confirm you can load the correct driver

    nvidia-smi

    you should get the correct output

    0 讨论(0)
  • 2021-01-29 17:52

    I committed the container into a docker image. Then I recreate another container using this docker image and the problem was gone.

    0 讨论(0)
  • 2021-01-29 17:53

    If you've recently updated, a reboot might solve this problem.

    0 讨论(0)
  • 2021-01-29 17:53
    sudo reboot
    

    Rebooting solved it for me.

    0 讨论(0)
  • 2021-01-29 17:53

    First I installed the Nvidia driver.

    Next I installed cuda.

    Ater that I got the "Driver/library version mismatch" ERROR but I could see the cuda version so I purged the Nvidia driver and reinstall it.

    Then it worked correctly.

    0 讨论(0)
  • 2021-01-29 17:54

    So I was having this problem, none of the other remedies worked. The error message was opaque, but checking dmesg was key:

    [   10.118255] NVRM: API mismatch: the client has the version 410.79, but
               NVRM: this kernel module has the version 384.130.  Please
               NVRM: make sure that this kernel module and all NVIDIA driver
               NVRM: components have the same version.
    

    However I had completely removed the 384 version, and removed any remaining kernel drivers nvidia-384*. But even after reboot, I was still getting this. Seeing this meant that the kernel was still compiled to reference 384, but was only finding 410. So I recompiled my kernel:

    # uname -a # find the kernel it's using
    Linux blah 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    # update-initramfs -c -k 4.13.0-43-generic #recompile it
    # reboot
    

    And then it worked.

    After removing 384, I still had 384 files in: /var/lib/dkms/nvidia-XXX/XXX.YY/4.13.0-43-generic/x86_64/module /lib/modules/4.13.0-43-generic/kernel/drivers

    I recommend using the locate command (not installed by default) rather than searching the filesystem every time.

    0 讨论(0)
提交回复
热议问题