I\'m using Tensorflow object detection API and working on pretrainedd ssd-mobilenet model. is there a way to extact the last global pooling of the mobilenet for each bbox as a f
As Steve said the feature vectors in Faster RCNN in the object-detection api seem to get dropped after the SecondStageBoxPredictor. I was able to thread them through the network by modifying the core/box_predictor.py and meta_architectures/faster_rcnn_meta_arch.py.
The crux of it is that the non-max suppression code actually has a parameter for additional_fields (see core/post_processing.py:176 on master). You can pass a dict of tensors which have the same shape in the first two dimensions as the boxes and scores and the function will return them filtered the same way as the boxes and scores have been. Here's a diff against master of the changes I made:
https://gist.github.com/donniet/c95d19e00ff9abeb786415b3a9348e62
Then instead of loading a frozen graph I had to rebuild the network and load the variables from a checkpoint like this (note: I downloaded the checkpoint for faster rcnn from here: http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28.tar.gz)
import sys
import os
import numpy as np
from object_detection.builders import model_builder
from object_detection.protos import pipeline_pb2
from google.protobuf import text_format
import tensorflow as tf
# load the pipeline structure from the config file
with open('object_detection/samples/configs/faster_rcnn_resnet101_coco.config', 'r') as content_file:
content = content_file.read()
# build the model with model_builder
pipeline_proto = pipeline_pb2.TrainEvalPipelineConfig()
text_format.Merge(content, pipeline_proto)
model = model_builder.build(pipeline_proto.model, is_training=False)
# construct a network using the model
image_placeholder = tf.placeholder(shape=(None,None,3), dtype=tf.uint8, name='input')
original_image = tf.expand_dims(image_placeholder, 0)
preprocessed_image, true_image_shapes = model.preprocess(tf.to_float(original_image))
prediction_dict = model.predict(preprocessed_image, true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes)
# create an input network to read a file
filename_placeholder = tf.placeholder(name='file_name', dtype=tf.string)
image_file = tf.read_file(filename_placeholder)
image_data = tf.image.decode_image(image_file)
# load the variables from a checkpoint
init_saver = tf.train.Saver()
sess = tf.Session()
init_saver.restore(sess, 'object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt')
# get the image data
blob = sess.run(image_data, feed_dict={filename_placeholder:'image.jpeg'})
# process the inference
output = sess.run(detections, feed_dict={image_placeholder:blob})
# get the shape of the image_features
print(output['image_features'].shape)
Caveat: I didn't run the tensorflow unit tests against the changes I made, so consider them for demo purposes only, and more testing should be done to make sure they didn't break something else in the object detection api.