I\'m running tensorflow 0.8.0 for python3 (pip installation), and the following file test.py
:
import tensorflow as tf
I think your GPU GTX 860M is a sm_50 device. The default TensorFlow binary supports sm_35 and sm_52 by default. That means your binary only has PTX, and the Cuda runtime has to JIT them into SASS on the first run of that kernel, and that takes one minute or so. But they should be cached in later runs, unless the caching was explicitly disabled.
The first call to eval() or run() is typically much slower than subsequent calls since it needs to setup the session. Subsequent calls to eval/run are typically much faster.