问题
I encountered the "first-run slow-down" problem with GTX 1080 cards and nvidia-docker as discussed in this question.
I'm using the TensorFlow build from its official pip package and a custom docker image based on nvidia-docker's Ubuntu 16.04 base image.
How do I make TensorFlow to load (and build JIT caches) all registered CUDA kernels programmatically in a Dockerfile? (rather than manually building TensorFlow using TF_CUDA_COMPUTE_CAPABILITIES
environment variable)
回答1:
There seems to be no easy way to achieve this since CUDA runtime implicitly, lazily compiles missing cubin from the given kernel sources as discussed here.
Solved this problem by rebuilding TensorFlow by myself, with some helper scripts to detect current CUDA/GPU configs and generate required TensorFlow configuration parameters (detect-cuda.py, build-tensorflow.sh).
来源:https://stackoverflow.com/questions/40503892/how-to-build-cuda-jit-caches-for-all-available-kernels-in-tensorflow-programmati