Problem converting tensorflow saved_model from float32 to float16 using TensorRT (TF-TRT)

问题

I have a tensorflow (version 1.14) float32 SavedModel that I want to convert to float16. According to https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example , I could pass "FP16" to precision_mode to convert the model to fp16. But the converted model, after checking the tensorboard, is still fp32: net paramters are DT_FLOAT instead of DT_HALF. And the size of the converted model is similar to the model before conversion. (Here I assume that, if converted successfully, the model will become half as large since paramters are cut in half).

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import os

FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string('saved_model_dir', '', 'Input saved model dir.')
tf.flags.DEFINE_bool('use_float16', False,
                     'Whether we want to quantize it to float16.')
tf.flags.DEFINE_string('output_dir', '', 'Output saved model dir.')


def main(argv):
    del argv  # Unused.
    saved_model_dir = FLAGS.saved_model_dir
    output_dir = FLAGS.output_dir
    use_float16 = FLAGS.use_float16

    precision_mode = "FP16" if use_float16 else "FP32"
    converter = trt.TrtGraphConverter(input_saved_model_dir=saved_model_dir,
                                      precision_mode=precision_mode)
    converter.convert()
    converter.save(output_dir)


if __name__ == '__main__':
    tf.app.run(main)

Any advices or suggestions are very welcome! Thanks

回答1:

You specify the precision mode correctly for TF-TRT. But checking the network parameters on TensorBoard will not reveal how the the TensorRT engine is internally storing the parameters of the converted model model.

There are few things to consider:

In TF-TRT we still keep the original Tensorflow weights after the model is converted to TensorRT. This is done so to provide a fallback to native TensorFlow execution if for some reason the TensorRT path would fail. This way the saved_model.pb file will be at least as large as the original model file.
The TensorRT engine contains a copy of the weights of the converted nodes. In FP16 mode, the TensorRT engine size will be roughly half the size of the original model (assuming that most of the nodes are converted). This is added to the original model size, so saved_model.pb would be 1.5x the size of the original model.
If we set is_dynamic_op=True (default in TF2), then the TensorRT engine creation is delayed until the first inference call. If we save the model before we run the first inference, then only a placeholder TRTEngineOp is added to the model, which does not really increase the model size.
In TF2 the TensorRT engines are serialized into separate files inside the Assets directory.

回答2:

Please try by changing:

tf.flags.DEFINE_bool('use_float16', False, 'Whether we want to quantize it to float16.')

tf.flags.DEFINE_bool('use_float16', True, 'Whether we want to quantize it to float16.')

This should work or give an appropriate error log because with the current code precision_mode gets set to "FP32". You need precision_mode = "FP16" to tryout half precision.

来源：https://stackoverflow.com/questions/60427672/problem-converting-tensorflow-saved-model-from-float32-to-float16-using-tensorrt

标签

tensorflow

tensorrt