I have implemented a TensorFlow DNN model (2 hidden layers with elu activation functions trained on MNIST) as a Python class in order to wrap TF calls within another library
Most of the graphics cards like the GTX 980, 1080, etc are stripped of the double precision floating point hardware units. Since these are much cheaper and therefore more ubiquitous than the Newer Tesla Units (which have FP64 double precision hardware), doing double precision calculations on the graphics cards is very slow compared to single precision. FP64 calculations on a GPU seem to be about 32 X slower than FP32 on a GPU without the FP64 hardware. I believe this is why the FP32 calculations tend to be set up for the GPU while FP64 for the CPU (which is faster in most systems.) Hopefully in the future, the frameworks will test the GPU capabilities at runtime to decide where to assign the FP64 calculations.
As Yaroslav noted: Mean, in particular, was not yet implemented for GPU, but it is now available so these operations should run on the GPU with the latest TensorFlow. (as per the DEVICE_GPU registration at that link)
Prior to availability of mean, the status of this was:
(a) You can implement mean by hand, because reduce_sum
is available on GPU.
(b) I've re-pinged someone to see if there's an easy way to add the GPU support, but we'll see.
Re float64
on GPU, someone opened an issue three days ago with a patch for supporting float64 reductions on GPU. Currently being reviewed and tested.
No, it doesn't matter if it's wrapped in Python - it's really just about whether a kernel has been defined for it to execute on the GPU or not. In many cases, the answer to "why is X supported on GPU by Y not?" comes down to whether or not there's been demand for Y to run on the GPU. The answer for float64 is simpler: float32 is a lot faster, so in most cases, people work to make their models work in float32 when possible because it gives all-around speed benefits.