Restoring a model trained with tf.estimator and feeding input through feed_dict

问题

I trained a resnet with tf.estimator, the model was saved during the training process. The saved files consist of .data, .index and .meta. I'd like to load this model back and get predictions for new images. The data was fed to the model during training using tf.data.Dataset. I have closely followed the resnet implementation given here.

I would like to restore the model and feed inputs to the nodes using a feed_dict.

First attempt

  #rebuild input pipeline
  images, labels = input_fn(data_dir, batch_size=32, num_epochs=1)

  #rebuild graph
  prediction= imagenet_model_fn(images,labels,{'batch_size':32,'data_format':'channels_first','resnet_size':18},mode = tf.estimator.ModeKeys.EVAL).predictions 

  saver  = tf.train.Saver()
  with tf.Session() as sess:
    ckpt = tf.train.get_checkpoint_state(r'./model')
    saver.restore(sess, ckpt.model_checkpoint_path)
    while True:
    try:
        pred,im= sess.run([prediction,images])
        print(pred)
    except tf.errors.OutOfRangeError:
      break

I fed a dataset which was evaluated on the same model using classifier.evaluate, but the above method gives wrong predictions. The model gives same class and probability, 1.0, for all images.

Second attempt

saver = tf.train.import_meta_graph(r'.\resnet\model\model-3220.meta')
sess = tf.Session()
saver.restore(sess,tf.train.latest_checkpoint(r'.\resnet\model'))
graph = tf.get_default_graph()
inputImage = graph.get_tensor_by_name('image:0')
logits= graph.get_tensor_by_name('logits:0')

#Get prediction
print(sess.run(logits,feed_dict={inputImage:newimage}))

This also gives wrong predictions compared to classifier.evaluate. I can even run sess.run(logits) without a feed_dict!

Third attempt

def serving_input_fn():
  receiver_tensor = {'feature': tf.placeholder(shape=[None, 384, 256, 3], dtype=tf.float32)}
  features = {'feature': receiver_tensor['images']}
return tf.estimator.export.ServingInputReceiver(features, receiver_tensor)

It fails with

Traceback (most recent call last):
  File "imagenet_main.py", line 213, in <module>
    tf.app.run(argv=[sys.argv[0]] + unparsed)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 124, in run
    _sys.exit(main(argv))
  File "imagenet_main.py", line 204, in main
    resnet.resnet_main(FLAGS, imagenet_model_fn, input_fn)
  File "C:\Users\Photogauge\Desktop\iprings_images\models-master\models-master\official\resnet\resnet.py", line 527, in resnet_main
    classifier.export_savedmodel(export_dir_base=r"C:\Users\Photogauge\Desktop\iprings_images\models-master\models-master\official\resnet\export", serving_input_receiver_fn=serving_input_fn)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 528, in export_savedmodel
    config=self.config)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 725, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "imagenet_main.py", line 200, in imagenet_model_fn
    loss_filter_fn=None)
  File "C:\Users\Photogauge\Desktop\iprings_images\models-master\models-master\official\resnet\resnet.py", line 433, in resnet_model_fn
    tf.argmax(labels, axis=1), predictions['classes'])
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 208, in argmax
    return gen_math_ops.arg_max(input, axis, name=name, output_type=output_type)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 508, in arg_max
    name=name)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 528, in _apply_op_helper
    (input_name, err))
ValueError: Tried to convert 'input' to a tensor and failed. Error: None values not supported.

The code I used for training and building the model is as below:

Specification for parsing the dataset:

def parse_record(raw_record, is_training):
  keys_to_features = {
      'image/encoded':
          tf.FixedLenFeature((), tf.string, default_value=''),
      'image/class/label':
          tf.FixedLenFeature([], dtype=tf.int64, default_value=-1),
  }
  parsed = tf.parse_single_example(raw_record, keys_to_features)
  image = tf.image.decode_image(
      tf.reshape(parsed['image/encoded'], shape=[]),3)
  image = tf.image.convert_image_dtype(image, dtype=tf.float32)
  label = tf.cast(
      tf.reshape(parsed['image/class/label'], shape=[]),
      dtype=tf.int32)
  return image, tf.one_hot(label,2)

The following function parses the data and creates batches for training

def input_fn(is_training, data_dir, batch_size, num_epochs=1):
  dataset = tf.data.Dataset.from_tensor_slices(
      filenames(is_training, data_dir))
  if is_training:
     dataset = dataset.shuffle(buffer_size=_FILE_SHUFFLE_BUFFER)
  dataset = dataset.flat_map(tf.data.TFRecordDataset)
  dataset = dataset.map(lambda value: parse_record(value, is_training),
                        num_parallel_calls=5)
  dataset = dataset.prefetch(batch_size)
  if is_training:
      dataset = dataset.shuffle(buffer_size=_SHUFFLE_BUFFER)
  dataset = dataset.repeat(num_epochs)
  dataset = dataset.batch(batch_size)

  iterator = dataset.make_one_shot_iterator()
  images, labels = iterator.get_next()
  return images, labels

A classifier is created as below for training on train set and evaluation on validation set

classifier = tf.estimator.Estimator(
      model_fn=model_function, model_dir=flags.model_dir, config=run_config,
      params={
          'resnet_size': flags.resnet_size,
          'data_format': flags.data_format,
          'batch_size': flags.batch_size,
      })

    #Training cycle
     classifier.train(
         input_fn=lambda: input_function(
             training_phase=True, flags.data_dir, flags.batch_size, flags.epochs_per_eval),
         hooks=[logging_hook])
    # Evaluate the model 
    eval_results = classifier.evaluate(input_fn=lambda: input_function(
        training_phase=False, flags.data_dir, flags.batch_size))

This is how I tried to load and get predictions from the model.

What is the right way to restore a saved model and perform inference on it. I want to feed images directly without using tf.data.Dataset.

Update

The value of ckpt is after running ckpt = tf.train.get_checkpoint_state(r'./model') is

model_checkpoint_path: "./model\model.ckpt-5980" all_model_checkpoint_paths: "./model\model.ckpt-5060" all_model_checkpoint_paths: "./model\model.ckpt-5061" all_model_checkpoint_paths: "./model\model.ckpt-5520" all_model_checkpoint_paths: "./model\model.ckpt-5521" all_model_checkpoint_paths: "./model\model.ckpt-5980"
The output is same when I try `saver.restore(sess, tf.train.latest_checkpoint(r'.\resnet\model'))
Passing in full path to saver.restore gives the same output In all cases the same model, model.ckpt-5980 was restored

回答1:

Note: This answer will evolve as soon as more information comes available. I'm not sure this is the most appropriate way to do it, but it feels better than using just comments. Feel free to drop a comment to the answer if this is inapproriate.

About your second attempt:

I don't have much experience with the import_meta_graph method, but if sess.run(logits) runs without complaining, I think the meta graph contains also your input pipeline.

A quick test I just made confirms that the pipeline is indeed restored too when you load the metagraph. This means, you're not actually passing in anything via feed_dict, because the input is taken from the Dataset-based pipeline that was used when the checkpoint was taken. From my research, I can't find a way to provide a different input function to the graph.

About the first attempt:

You code looks right to me, so my suspicion is that the checkpoint file that gets loaded is somehow wrong. I asked some clarifications in a comment, I'll update this part as soon as that info is available

回答2:

If you have model pb or pb.txt then inference is easy. Using predictor module, we can do an inference. Check out here for more information. For image data it will be something to similar to below example. Hope this helps !!

Example code:

import numpy as np
import matplotlib.pyplot as plt

def extract_data(index=0, filepath='data/cifar-10-batches-bin/data_batch_5.bin'):
    bytestream = open(filepath, mode='rb')
    label_bytes_length = 1
    image_bytes_length = (32 ** 2) * 3
    record_bytes_length = label_bytes_length + image_bytes_length
    bytestream.seek(record_bytes_length * index, 0)
    label_bytes = bytestream.read(label_bytes_length)
    image_bytes = bytestream.read(image_bytes_length)
    label = np.frombuffer(label_bytes, dtype=np.uint8)  
    image = np.frombuffer(image_bytes, dtype=np.uint8)
    image = np.reshape(image, [3, 32, 32])
    image = np.transpose(image, [1, 2, 0])
    image = image.astype(np.float32)
   result = {
     'image': image,
     'label': label,
   }
   bytestream.close()
   return result


    predictor_fn = tf.contrib.predictor.from_saved_model(
  export_dir = saved_model_dir, signature_def_key='predictions')
    N = 1000
    labels = []
    images = []
    for i in range(N):
       result = extract_data(i)
       images.append(result['image'])
       labels.append(result['label'][0])
    output = predictor_fn(
      {
        'images': images,
      }
    )

来源：https://stackoverflow.com/questions/48679622/restoring-a-model-trained-with-tf-estimator-and-feeding-input-through-feed-dict

标签

python

tensorflow

tensorflow-serving

tensorflow-datasets

tensorflow-estimator