Old docker containers are not usable (no GPU) after updating the GPU driver in the host machine

一笑奈何 提交于 2020-08-09 18:42:06

问题


Today, we updated the GPU driver for our host machine, and the new containers that we created are all working fine. However, all of our existing docker containers give the following error when running the nvidia-smi command inside:

Failed to initialize NVML: Driver/library version mismatch

How to rescue these old containers? Our previous GPU driver version in the host machine was 384.125 and it is now 430.64.

Host Configuration

nvidia-smi gives

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-DGXS...  Off  | 00000000:07:00.0  On |                    0 |
| N/A   40C    P0    39W / 300W |    182MiB / 32505MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-DGXS...  Off  | 00000000:08:00.0 Off |                    0 |
| N/A   40C    P0    39W / 300W |     12MiB / 32508MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-DGXS...  Off  | 00000000:0E:00.0 Off |                    0 |
| N/A   39C    P0    40W / 300W |     12MiB / 32508MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-DGXS...  Off  | 00000000:0F:00.0 Off |                    0 |
| N/A   40C    P0    38W / 300W |     12MiB / 32508MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1583      G   /usr/lib/xorg/Xorg                           169MiB |
+-----------------------------------------------------------------------------+

nvcc --version gives

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

dpkg -l | grep -i docker gives

ii  dgx-docker-cleanup                         1.0-1                                           amd64        DGX Docker cleanup script
rc  dgx-docker-options                         1.0-7                                           amd64        DGX docker daemon options
ii  dgx-docker-repo                            1.0-1                                           amd64        docker repository configuration file
ii  docker-ce                                  5:18.09.2~3-0~ubuntu-xenial                     amd64        Docker: the open-source application container engine
ii  docker-ce-cli                              5:18.09.2~3-0~ubuntu-xenial                     amd64        Docker CLI: the open-source application container engine
ii  nvidia-container-runtime                   2.0.0+docker18.09.2-1                           amd64        NVIDIA container runtime
ii  nvidia-docker                              1.0.1-1                                         amd64        NVIDIA Docker container tools
rc  nvidia-docker2                             2.0.3+docker18.09.2-1                           all          nvidia-docker CLI wrapper

docker version gives

Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false

来源:https://stackoverflow.com/questions/63079329/old-docker-containers-are-not-usable-no-gpu-after-updating-the-gpu-driver-in-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!