Docker 安装 TensorFlow GPU 实战

六眼飞鱼酱① 提交于 2019-12-04 08:34:27

安装背景

AI如雨后春笋般的出现,DEVOPS的理论不断深入。所有高大上的开源产品都支持两个环境:docker 和Linux。本文主要讲解怎么在一台安装了GPU的centos7 环境安装tensorflow docker镜像。国内就几个大厂的同学可以享受这种高级环境待遇,如果您有该环境建议尝试起来吧,毕竟AI可以让我们多一项skill。

安装nvidia-docker

nvidia 对docker进行了一层封装,可以支持nivdia 的cpu。
具体的安装过程可以参考:
https://github.com/NVIDIA/nvidia-docker?utm_source=tuicool&utm_medium=referral

安装玩以后使用nvidia配置的命令:

[root@~]# nvidia-
nvidia-bug-report.sh     nvidia-debugdump         nvidia-installer         nvidia-settings          nvidia-xconfig
nvidia-cuda-mps-control  nvidia-docker            nvidia-modprobe          nvidia-smi               
nvidia-cuda-mps-server   nvidia-docker-plugin     nvidia-persistenced      nvidia-uninstall 

如果有下面的错误,说明没有启动相关服务:

[root@ourui]# nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu
docker: Error response from daemon: create nvidia_driver_367.48: create nvidia_driver_367.48: Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found.
See 'docker run --help'.

使用下面命令查看nvidia-docker 是否启动

root@ourui]# systemctl status nvidia-docker
● nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/usr/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
[root@ourui]# systemctl start nvidia-docker 
[root@ourui]# systemctl status nvidia-docker
● nvidia-docker.service - NVIDIA Docker plugin
   Loaded: loaded (/usr/lib/systemd/system/nvidia-docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-03-27 10:39:16 CST; 2s ago
     Docs: https://github.com/NVIDIA/nvidia-docker/wiki
  Process: 51649 ExecStartPost=/bin/sh -c /bin/echo unix://$SOCK_DIR/nvidia-docker.sock > $SPEC_FILE (code=exited, status=0/SUCCESS)
  Process: 51644 ExecStartPost=/bin/sh -c /bin/mkdir -p $( dirname $SPEC_FILE ) (code=exited, status=0/SUCCESS)
 Main PID: 51643 (nvidia-docker-p)
   Memory: 13.9M
   CGroup: /system.slice/nvidia-docker.service
           └─51643 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

Mar 27 10:39:16 ctum2e1302005.idc.wanda-group.net systemd[1]: Starting NVIDIA Docker plugin...
Mar 27 10:39:16 ctum2e1302005.idc.wanda-group.net systemd[1]: Started NVIDIA Docker plugin.

这一步就把基本的nvidia docker 环境安装好。需要注意,nvidia没有提供最新发布docker的版本,如果需要测试最新的docker release版本需要使用别的方法。

下载docker images

tensorflow 社区在docker hub 提供了一套images下载地址:
https://hub.docker.com/r/tensorflow/tensorflow/

由于我们都知道的原因,国内有时候下载docker hub的images 都是问题。我让我想起了一句话:这是一个最好的时代、也是一个最坏的时代。为了自己的房贷,想办法吧!

国内很多docker hub ,当然可以直接使用国内的docker hub,同时也提供了一些加速器,所谓加速,你们明白的。下面我们看看使用阿里云加速器:

https://yq.aliyun.com/articles/29941

设置好了过后就可以直接下载docker iamges 了

nvidia-docker pull tensorflow/tensorflow:latest-gpu

启动container

[root@ourui]# nvidia-docker run -it -d -p  8888:8888 tensorflow/tensorflow:latest-gpu  
69fede4460082f3e4aa847fc34ac0f58e797dc44b10d65643a70d2a1e7e4ba03
[root@ourui]# 
[root@ourui]# nvidia-docker logs 69fede4460082f3e4aa847fc34ac0f58e797dc44b10d65643a70d2a1e7e4ba03
[I 02:45:08.016 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[W 02:45:08.031 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 02:45:08.037 NotebookApp] Serving notebooks from local directory: /notebooks
[I 02:45:08.037 NotebookApp] 0 active kernels 
[I 02:45:08.037 NotebookApp] The Jupyter Notebook is running at: http://[all ip addresses on your system]:8888/?token=f1d1717e2fdbf8c1807f5017315396be05a6b95310d87cb9
[I 02:45:08.038 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 02:45:08.038 NotebookApp] 

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=f1d1717e2fdbf8c1807f5017315396be05a6b95310d87cb9

测试

打开web:
http://ip:8888/?token=f1d1717e2fdbf8c1807f5017315396be05a6b95310d87cb9

这里写图片描述
这里写图片描述

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!