Float16 slower than float32 in keras

前端 未结 2 1745
终归单人心
终归单人心 2020-12-30 01:30

I\'m testing out my new NVIDIA Titan V, which supports float16 operations. I noticed that during training, float16 is much slower (~800 ms/step) than float32 (~500 ms/step)

相关标签:
2条回答
  • 2020-12-30 02:03

    I updated to CUDA 10.0, cuDNN 7.4.1, tensorflow 1.13.1, keras 2.2.4, and python 3.7.3. Using the same code as in the OP, training time was marginally faster with float16 over float32.

    I fully expect that a more complex network architecture would show a bigger difference in performance, but I didn't test this.

    0 讨论(0)
  • 2020-12-30 02:09

    From the documentation of cuDNN (section 2.7, subsection Type Conversion) you can see:

    Note: Accumulators are 32-bit integers which wrap on overflow.

    and that this holds for the standard INT8 data type of the following: the data input, the filter input and the output.

    Under those assumptions, @jiandercy is right that there's a float16 to float32 conversion and then back-conversion before returning the result, and float16 would be slower.

    0 讨论(0)
提交回复
热议问题