How to build CUDA JIT caches for all available kernels in TensorFlow programmatically?

问题

I encountered the "first-run slow-down" problem with GTX 1080 cards and nvidia-docker as discussed in this question.

I'm using the TensorFlow build from its official pip package and a custom docker image based on nvidia-docker's Ubuntu 16.04 base image.

How do I make TensorFlow to load (and build JIT caches) all registered CUDA kernels programmatically in a Dockerfile? (rather than manually building TensorFlow using TF_CUDA_COMPUTE_CAPABILITIES environment variable)

回答1:

There seems to be no easy way to achieve this since CUDA runtime implicitly, lazily compiles missing cubin from the given kernel sources as discussed here.

Solved this problem by rebuilding TensorFlow by myself, with some helper scripts to detect current CUDA/GPU configs and generate required TensorFlow configuration parameters (detect-cuda.py, build-tensorflow.sh).

来源：https://stackoverflow.com/questions/40503892/how-to-build-cuda-jit-caches-for-all-available-kernels-in-tensorflow-programmati

标签

cuda

tensorflow

nvidia

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!