Using GPU from a docker container?

后端 未结 10 1679
名媛妹妹
名媛妹妹 2020-11-27 09:12

I\'m searching for a way to use the GPU from inside a docker container.

The container will execute arbitrary code so i don\'t want to use the privileged mode.

<
相关标签:
10条回答
  • 2020-11-27 09:16

    Recent enhancements by NVIDIA have produced a much more robust way to do this.

    Essentially they have found a way to avoid the need to install the CUDA/GPU driver inside the containers and have it match the host kernel module.

    Instead, drivers are on the host and the containers don't need them. It requires a modified docker-cli right now.

    This is great, because now containers are much more portable.

    A quick test on Ubuntu:

    # Install nvidia-docker and nvidia-docker-plugin
    wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
    sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
    
    # Test nvidia-smi
    nvidia-docker run --rm nvidia/cuda nvidia-smi
    

    For more details see: GPU-Enabled Docker Container and: https://github.com/NVIDIA/nvidia-docker

    0 讨论(0)
  • 2020-11-27 09:17

    Goal:

    My goal was to make a CUDA enabled docker image without using nvidia/cuda as base image. Because I have some custom jupyter image, and I want to base from that.

    Prerequisite:

    The host machine had nvidia driver, CUDA toolkit, and nvidia-container-toolkit already installed. Please refer to the official docs, and to Rohit's answer.

    Test that nvidia driver and CUDA toolkit is installed correctly with: nvidia-smi on the host machine, which should display correct "Driver Version" and "CUDA Version" and shows GPUs info.

    Test that nvidia-container-toolkit is installed correctly with: docker run --rm --gpus all nvidia/cuda:latest nvidia-smi

    Dockerfile

    I found what I assume to be the official Dockerfile for nvidia/cuda here I "flattened" it, appended the contents to my Dockerfile and tested it to be working nicely:

    FROM sidazhou/scipy-notebook:latest
    # FROM ubuntu:18.04 
    
    ###########################################################################
    # See https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/10.1/ubuntu18.04-x86_64/base/Dockerfile
    # See https://sarus.readthedocs.io/en/stable/user/custom-cuda-images.html
    ###########################################################################
    USER root
    
    ###########################################################################
    # base
    RUN apt-get update && apt-get install -y --no-install-recommends \
        gnupg2 curl ca-certificates && \
        curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - && \
        echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \
        echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list && \
        apt-get purge --autoremove -y curl \
        && rm -rf /var/lib/apt/lists/*
    
    ENV CUDA_VERSION 10.1.243
    ENV CUDA_PKG_VERSION 10-1=$CUDA_VERSION-1
    
    # For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
    RUN apt-get update && apt-get install -y --no-install-recommends \
        cuda-cudart-$CUDA_PKG_VERSION \
        cuda-compat-10-1 \
        && ln -s cuda-10.1 /usr/local/cuda && \
        rm -rf /var/lib/apt/lists/*
    
    # Required for nvidia-docker v1
    RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
        echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
    
    ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
    ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
    
    
    ###########################################################################
    #runtime next
    ENV NCCL_VERSION 2.7.8
    
    RUN apt-get update && apt-get install -y --no-install-recommends \
        cuda-libraries-$CUDA_PKG_VERSION \
        cuda-npp-$CUDA_PKG_VERSION \
        cuda-nvtx-$CUDA_PKG_VERSION \
        libcublas10=10.2.1.243-1 \
        libnccl2=$NCCL_VERSION-1+cuda10.1 \
        && apt-mark hold libnccl2 \
        && rm -rf /var/lib/apt/lists/*
    
    # apt from auto upgrading the cublas package. See https://gitlab.com/nvidia/container-images/cuda/-/issues/88
    RUN apt-mark hold libcublas10
    
    
    ###########################################################################
    #cudnn7 (not cudnn8) next
    
    ENV CUDNN_VERSION 7.6.5.32
    
    RUN apt-get update && apt-get install -y --no-install-recommends \
        libcudnn7=$CUDNN_VERSION-1+cuda10.1 \
        && apt-mark hold libcudnn7 && \
        rm -rf /var/lib/apt/lists/*
    
    
    ENV NVIDIA_VISIBLE_DEVICES all
    ENV NVIDIA_DRIVER_CAPABILITIES all
    ENV NVIDIA_REQUIRE_CUDA "cuda>=10.1"
    
    
    ###########################################################################
    #docker build -t sidazhou/scipy-notebook-gpu:latest .
    
    #docker run -itd -gpus all\
    #  -p 8888:8888 \
    #  -p 6006:6006 \
    #  --user root \
    #  -e NB_UID=$(id -u) \
    #  -e NB_GID=$(id -g) \
    #  -e GRANT_SUDO=yes \
    #  -v ~/workspace:/home/jovyan/work \
    #  --name sidazhou-jupyter-gpu \
    #  sidazhou/scipy-notebook-gpu:latest
    
    #docker exec sidazhou-jupyter-gpu python -c "import tensorflow as tf; print(tf.config.experimental.list_physical_devices('GPU'))"
    
    0 讨论(0)
  • 2020-11-27 09:18

    Ok i finally managed to do it without using the --privileged mode.

    I'm running on ubuntu server 14.04 and i'm using the latest cuda (6.0.37 for linux 13.04 64 bits).


    Preparation

    Install nvidia driver and cuda on your host. (it can be a little tricky so i will suggest you follow this guide https://askubuntu.com/questions/451672/installing-and-testing-cuda-in-ubuntu-14-04)

    ATTENTION : It's really important that you keep the files you used for the host cuda installation


    Get the Docker Daemon to run using lxc

    We need to run docker daemon using lxc driver to be able to modify the configuration and give the container access to the device.

    One time utilization :

    sudo service docker stop
    sudo docker -d -e lxc
    

    Permanent configuration Modify your docker configuration file located in /etc/default/docker Change the line DOCKER_OPTS by adding '-e lxc' Here is my line after modification

    DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4 -e lxc"
    

    Then restart the daemon using

    sudo service docker restart
    

    How to check if the daemon effectively use lxc driver ?

    docker info
    

    The Execution Driver line should look like that :

    Execution Driver: lxc-1.0.5
    

    Build your image with the NVIDIA and CUDA driver.

    Here is a basic Dockerfile to build a CUDA compatible image.

    FROM ubuntu:14.04
    MAINTAINER Regan <http://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container>
    
    RUN apt-get update && apt-get install -y build-essential
    RUN apt-get --purge remove -y nvidia*
    
    ADD ./Downloads/nvidia_installers /tmp/nvidia                             > Get the install files you used to install CUDA and the NVIDIA drivers on your host
    RUN /tmp/nvidia/NVIDIA-Linux-x86_64-331.62.run -s -N --no-kernel-module   > Install the driver.
    RUN rm -rf /tmp/selfgz7                                                   > For some reason the driver installer left temp files when used during a docker build (i don't have any explanation why) and the CUDA installer will fail if there still there so we delete them.
    RUN /tmp/nvidia/cuda-linux64-rel-6.0.37-18176142.run -noprompt            > CUDA driver installer.
    RUN /tmp/nvidia/cuda-samples-linux-6.0.37-18176142.run -noprompt -cudaprefix=/usr/local/cuda-6.0   > CUDA samples comment if you don't want them.
    RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64         > Add CUDA library into your PATH
    RUN touch /etc/ld.so.conf.d/cuda.conf                                     > Update the ld.so.conf.d directory
    RUN rm -rf /temp/*  > Delete installer files.
    

    Run your image.

    First you need to identify your the major number associated with your device. Easiest way is to do the following command :

    ls -la /dev | grep nvidia
    

    If the result is blank, use launching one of the samples on the host should do the trick. The result should look like that enter image description here As you can see there is a set of 2 numbers between the group and the date. These 2 numbers are called major and minor numbers (wrote in that order) and design a device. We will just use the major numbers for convenience.

    Why do we activated lxc driver? To use the lxc conf option that allow us to permit our container to access those devices. The option is : (i recommend using * for the minor number cause it reduce the length of the run command)

    --lxc-conf='lxc.cgroup.devices.allow = c [major number]:[minor number or *] rwm'

    So if i want to launch a container (Supposing your image name is cuda).

    docker run -ti --lxc-conf='lxc.cgroup.devices.allow = c 195:* rwm' --lxc-conf='lxc.cgroup.devices.allow = c 243:* rwm' cuda
    
    0 讨论(0)
  • 2020-11-27 09:21

    We just released an experimental GitHub repository which should ease the process of using NVIDIA GPUs inside Docker containers.

    0 讨论(0)
  • 2020-11-27 09:25

    Use x11docker by mviereck:

    https://github.com/mviereck/x11docker#hardware-acceleration says

    Hardware acceleration

    Hardware acceleration for OpenGL is possible with option -g, --gpu.

    This will work out of the box in most cases with open source drivers on host. Otherwise have a look at wiki: feature dependencies. Closed source NVIDIA drivers need some setup and support less x11docker X server options.

    This script is really convenient as it handles all the configuration and setup. Running a docker image on X with gpu is as simple as

    x11docker --gpu imagename
    
    0 讨论(0)
  • 2020-11-27 09:33

    I would not recommend installing CUDA/cuDNN on the host if you can use docker. Since at least CUDA 8 it has been possible to "stand on the shoulders of giants" and use nvidia/cuda base images maintained by NVIDIA in their Docker Hub repo. Go for the newest and biggest one (with cuDNN if doing deep learning) if unsure which version to choose.

    A starter CUDA container:

    mkdir ~/cuda11
    cd ~/cuda11
    
    echo "FROM nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04" > Dockerfile
    echo "CMD [\"/bin/bash\"]" >> Dockerfile
    
    docker build --tag mirekphd/cuda11 .
    
    docker run --rm -it --gpus 1 mirekphd/cuda11 nvidia-smi
    
    

    Sample output:

    (if nvidia-smi is not found in the container, do not try install it there - it was already installed on thehost with NVIDIA GPU driver and should be made available from the host to the container system if docker has access to the GPU(s)):

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.57       Driver Version: 450.57       CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
    |  0%   50C    P8    17W / 280W |    409MiB / 11177MiB |      7%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    

    Prerequisites

    1. Appropriate NVIDIA driver with the latest CUDA version support to be installed first on the host (download it from NVIDIA Driver Downloads and then mv driver-file.run driver-file.sh && chmod +x driver-file.sh && ./driver-file.sh). These are have been forward-compatible since CUDA 10.1.

    2. GPU access enabled in docker by installing sudo apt get update && sudo apt get install nvidia-container-toolkit (and then restarting docker daemon using sudo systemctl restart docker).

    0 讨论(0)
提交回复
热议问题