Eager load the entire model to estimate memory consumption of Tensorflow Serving

情到浓时终转凉″ 提交于 2019-12-10 15:54:54

问题


Tensorflow Serving lazy initializes nodes in the model DAG as predictions get executed. This makes it hard to estimate memory (RAM) that is required to hold the entire model. Is there a standard way to force Tensorflow Serving to fully initialize/load model into memory?


回答1:


You can use model warmup to force all the components to be loaded into memory. [1]

[1] https://www.tensorflow.org/tfx/serving/saved_model_warmup




回答2:


Adding the content of the link, which is provided by @PedApps, below.

Introduction:

The TensorFlow runtime has components that are lazily initialized, which can cause high latency for the first request/s sent to a model after it is loaded.

This latency can be several orders of magnitude higher than that of a single inference request.

To reduce the impact of lazy initialization on request latency, it's possible to trigger the initialization of the sub-systems and components at model load time by providing a sample set of inference requests along with the SavedModel.

This process is known as "warming up" the model.

Usage:

SavedModel Warmup is supported for Regress, Classify, MultiInference and Predict.

To trigger warmup of the model at load time, attach a warmup data file under the assets.extra subfolder of the SavedModel directory.

Requirements for model warmup to work correctly:

  • Warmup file name: 'tf_serving_warmup_requests'

  • File location: assets.extra/

  • File format: TFRecord with each record as a PredictionLog.

  • Number of warmup records <= 1000.

  • The warmup data must be representative of the inference requests used at serving.

Example code snippet producing warmup data:

import tensorflow as tf
from tensorflow_serving.apis import classification_pb2
from tensorflow_serving.apis import inference_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
from tensorflow_serving.apis import regression_pb2

def main():
    with tf.python_io.TFRecordWriter("tf_serving_warmup_requests") as writer:
        # replace <request> with one of:
        # predict_pb2.PredictRequest(..)
        # classification_pb2.ClassificationRequest(..)
        # regression_pb2.RegressionRequest(..)
        # inference_pb2.MultiInferenceRequest(..)
        log = prediction_log_pb2.PredictionLog(
            predict_log=prediction_log_pb2.PredictLog(request=<request>))
        writer.write(log.SerializeToString())

if __name__ == "__main__":
    main()


来源:https://stackoverflow.com/questions/56012618/eager-load-the-entire-model-to-estimate-memory-consumption-of-tensorflow-serving

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!