问题
Context:
I have a simple classifier based on tf.estimator.DNNClassifier that takes text and output probabilities over an intent tags. I am able to train an export the model to a serveable as well as serve the serveable using tensorflow serving. The problem is this servable is too big (around 1GB) and so I wanted to try some tensorflow graph transforms to try to reduce the size of the files being served.
Problem:
I understand how to take the saved_model.pb
and use freeze_model.py to create a new .pb
file that can be used to call transforms on. The result of these transforms (a .pb
file as well) is not a servable and cannot be used with tensorflow serving.
How can a developer go from:
saved model -> graph transforms -> back to a servable
There's documentation that suggests that this is certainly possible, but its not at all intuitive from the docs as to how to do this.
What I've Tried:
import tensorflow as tf
from tensorflow.saved_model import simple_save
from tensorflow.saved_model import signature_constants
from tensorflow.saved_model import tag_constants
from tensorflow.tools.graph_transforms import TransformGraph
with tf.Session(graph=tf.Graph()) as sess_meta:
meta_graph_def = tf.saved_model.loader.load(
sess_meta,
[tag_constants.SERVING],
"/model/path")
graph_def = meta_graph_def.graph_def
other_graph_def = TransformGraph(
graph_def,
["Placeholder"],
["dnn/head/predictions/probabilities"],
["quantize_weights"])
with tf.Graph().as_default():
graph = tf.get_default_graph()
tf.import_graph_def(other_graph_def)
in_tensor = graph.get_tensor_by_name(
"import/Placeholder:0")
out_tensor = graph.get_tensor_by_name(
"import/dnn/head/predictions/probabilities:0")
inputs = {"inputs": in_tensor}
outputs = {"outputs": out_tensor}
simple_save(sess_meta, "./new", inputs, outputs)
My idea was to load the servable, extract the graph_def from the meta_graph_def, transform the graph_def and then try to recreate the servable. This seems to be the incorrect approach.
Is there a way to successfully perform transforms (to reduce file size at inference) on a graph from an exported servable, and then recreate a servable with the transformed graph?
Thanks.
Update (2018-08-28):
Found contrib.meta_graph_transform() which looks promising.
Update (2018-12-03):
A related github issue I opened that seems to be resolved in a detailed blog post which is listed at the end of the ticket.
回答1:
We can optimize or reduce the size of a Tensorflow Model using the below mentioned methods:
Freezing: Convert the variables stored in a checkpoint file of the SavedModel into constants stored directly in the model graph. This reduces the overall size of the model.
Pruning: Strip unused nodes in the prediction path and the outputs of the graph, merging duplicate nodes, as well as cleaning other node ops like summary, identity, etc.
Constant folding: Look for any sub-graphs within the model that always evaluate to constant expressions, and replace them with those constants. Folding batch norms: Fold the multiplications introduced in batch normalization into the weight multiplications of the previous layer.
Quantization: Convert weights from floating point to lower precision, such as 16 or 8 bits.
Code for Freezing a Graph is mentioned below:
from tensorflow.python.tools import freeze_graph
output_graph_filename = os.path.join(saved_model_dir, output_filename)
initializer_nodes = ''
freeze_graph.freeze_graph(input_saved_model_dir=saved_model_dir,
output_graph=output_graph_filename,
saved_model_tags = tag_constants.SERVING,
output_node_names=output_node_names,initializer_nodes=initializer_nodes,
input_graph=None, input_saver=False, input_binary=False,
input_checkpoint=None, restore_op_name=None, filename_tensor_name=None,
clear_devices=False, input_meta_graph=False)
Code for Pruning and Constant Folding is mentioned below:
from tensorflow.tools.graph_transforms import TransformGraph
def get_graph_def_from_file(graph_filepath):
with ops.Graph().as_default():
with tf.gfile.GFile(graph_filepath, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return graph_def
def optimize_graph(model_dir, graph_filename, transforms, output_node):
input_names = []
output_names = [output_node]
if graph_filename is None:
graph_def = get_graph_def_from_saved_model(model_dir)
else:
graph_def = get_graph_def_from_file(os.path.join(model_dir,
graph_filename))
optimized_graph_def = TransformGraph(graph_def, input_names,
output_names, transforms)
tf.train.write_graph(optimized_graph_def, logdir=model_dir, as_text=False,
name='optimized_model.pb')
print('Graph optimized!')
We call the code on our model by passing a list of the desired optimizations, like so:
transforms = ['remove_nodes(op=Identity)', 'merge_duplicate_nodes',
'strip_unused_nodes','fold_constants(ignore_errors=true)',
'fold_batch_norms']
optimize_graph(saved_model_dir, "frozen_model.pb" , transforms, 'head/predictions/class_ids')
Code for Quantization is mentioned below:
transforms = ['quantize_nodes', 'quantize_weights',]
optimize_graph(saved_model_dir, None, transforms, 'head/predictions/class_ids')
Once the Optimizations are applied, we need to convert the Optimized Graph back to GraphDef. Code for that is shown below:
def convert_graph_def_to_saved_model(export_dir, graph_filepath):
if tf.gfile.Exists(export_dir):
tf.gfile.DeleteRecursively(export_dir)
graph_def = get_graph_def_from_file(graph_filepath)
with tf.Session(graph=tf.Graph()) as session:
tf.import_graph_def(graph_def, name='')
tf.saved_model.simple_save(
session,
export_dir,
inputs={
node.name: session.graph.get_tensor_by_name(
'{}:0'.format(node.name))
for node in graph_def.node if node.op=='Placeholder'},
outputs={'class_ids': session.graph.get_tensor_by_name(
'head/predictions/class_ids:0')}
)
print('Optimized graph converted to SavedModel!')
Example Code is shown below:
optimized_export_dir = os.path.join(export_dir, 'optimized')
optimized_filepath = os.path.join(saved_model_dir, 'optimized_model.pb')
convert_graph_def_to_saved_model(optimized_export_dir, optimized_filepath)
For more information, refer the below link, which was mentioned by @gobrewers14:
https://medium.com/google-cloud/optimizing-tensorflow-models-for-serving-959080e9ddbf
来源:https://stackoverflow.com/questions/51971050/graph-optimizations-on-a-tensorflow-serveable-created-using-tf-estimator