Graph optimizations on a tensorflow serveable created using tf.Estimator

前端 未结 1 862
猫巷女王i
猫巷女王i 2021-02-19 07:12

Context:

I have a simple classifier based on tf.estimator.DNNClassifier that takes text and output probabilities over an intent tags. I am able to trai

1条回答
  •  悲&欢浪女
    2021-02-19 07:37

    We can optimize or reduce the size of a Tensorflow Model using the below mentioned methods:

    1. Freezing: Convert the variables stored in a checkpoint file of the SavedModel into constants stored directly in the model graph. This reduces the overall size of the model.

    2. Pruning: Strip unused nodes in the prediction path and the outputs of the graph, merging duplicate nodes, as well as cleaning other node ops like summary, identity, etc.

    3. Constant folding: Look for any sub-graphs within the model that always evaluate to constant expressions, and replace them with those constants. Folding batch norms: Fold the multiplications introduced in batch normalization into the weight multiplications of the previous layer.

    4. Quantization: Convert weights from floating point to lower precision, such as 16 or 8 bits.

    Code for Freezing a Graph is mentioned below:

    from tensorflow.python.tools import freeze_graph
    
    output_graph_filename = os.path.join(saved_model_dir, output_filename)
    initializer_nodes = ''
    
    freeze_graph.freeze_graph(input_saved_model_dir=saved_model_dir,
          output_graph=output_graph_filename,
          saved_model_tags = tag_constants.SERVING,
          output_node_names=output_node_names,initializer_nodes=initializer_nodes,
          input_graph=None, input_saver=False, input_binary=False, 
          input_checkpoint=None, restore_op_name=None, filename_tensor_name=None,
          clear_devices=False, input_meta_graph=False)
    

    Code for Pruning and Constant Folding is mentioned below:

    from tensorflow.tools.graph_transforms import TransformGraph
    
    def get_graph_def_from_file(graph_filepath):
      with ops.Graph().as_default():
        with tf.gfile.GFile(graph_filepath, 'rb') as f:
          graph_def = tf.GraphDef()
          graph_def.ParseFromString(f.read())
          return graph_def
    
    def optimize_graph(model_dir, graph_filename, transforms, output_node):
      input_names = []
      output_names = [output_node]
      if graph_filename is None:
        graph_def = get_graph_def_from_saved_model(model_dir)
      else:
        graph_def = get_graph_def_from_file(os.path.join(model_dir, 
             graph_filename))
      optimized_graph_def = TransformGraph(graph_def, input_names,      
          output_names, transforms)
      tf.train.write_graph(optimized_graph_def, logdir=model_dir, as_text=False, 
         name='optimized_model.pb')
      print('Graph optimized!')
    

    We call the code on our model by passing a list of the desired optimizations, like so:

    transforms = ['remove_nodes(op=Identity)', 'merge_duplicate_nodes',
     'strip_unused_nodes','fold_constants(ignore_errors=true)',
     'fold_batch_norms']
    
    optimize_graph(saved_model_dir, "frozen_model.pb" , transforms, 'head/predictions/class_ids')
    

    Code for Quantization is mentioned below:

    transforms = ['quantize_nodes', 'quantize_weights',]
    optimize_graph(saved_model_dir, None, transforms, 'head/predictions/class_ids')
    

    Once the Optimizations are applied, we need to convert the Optimized Graph back to GraphDef. Code for that is shown below:

    def convert_graph_def_to_saved_model(export_dir, graph_filepath):
      if tf.gfile.Exists(export_dir):
        tf.gfile.DeleteRecursively(export_dir)
      graph_def = get_graph_def_from_file(graph_filepath)
      with tf.Session(graph=tf.Graph()) as session:
        tf.import_graph_def(graph_def, name='')
        tf.saved_model.simple_save(
            session,
            export_dir,
            inputs={
                node.name: session.graph.get_tensor_by_name(
                    '{}:0'.format(node.name))
                for node in graph_def.node if node.op=='Placeholder'},
            outputs={'class_ids': session.graph.get_tensor_by_name(
                'head/predictions/class_ids:0')}
        )
        print('Optimized graph converted to SavedModel!')
    

    Example Code is shown below:

    optimized_export_dir = os.path.join(export_dir, 'optimized')
    optimized_filepath = os.path.join(saved_model_dir, 'optimized_model.pb')
    convert_graph_def_to_saved_model(optimized_export_dir, optimized_filepath)
    

    For more information, refer the below link, which was mentioned by @gobrewers14:

    https://medium.com/google-cloud/optimizing-tensorflow-models-for-serving-959080e9ddbf

    0 讨论(0)
提交回复
热议问题