问题
I intend to use pre-trained model like faster_rcnn_resnet101_pets for Object Detection in Tensorflow environment as described here
I have collected several images for training and testing set. All these images are of varying size. Do I have to resize them to a common size ?
faster_rcnn_resnet101_pets uses resnet with input size 224x224x3.
Does this mean I have to resize all my images before sending for training ? Or It is taken care automatically by TF.
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_resnet101_pets.config
In general, is it a good practice to have images of same size?
回答1:
No, you do not need to resize your input images to fixed shapes yourself. Tensorflow object detection api has a prepocessing step that will resize all input images. Following is a function defined within preprocessing step and there is a image_resizer_fn
, it corresponds to a field named image_resizer
within the config file.
def transform_input_data(tensor_dict,
model_preprocess_fn,
image_resizer_fn,
num_classes,
data_augmentation_fn=None,
merge_multiple_boxes=False,
retain_original_image=False,
use_multiclass_scores=False,
use_bfloat16=False):
"""A single function that is responsible for all input data transformations.
Data transformation functions are applied in the following order.
1. If key fields.InputDataFields.image_additional_channels is present in
tensor_dict, the additional channels will be merged into
fields.InputDataFields.image.
2. data_augmentation_fn (optional): applied on tensor_dict.
3. model_preprocess_fn: applied only on image tensor in tensor_dict.
4. image_resizer_fn: applied on original image and instance mask tensor in
tensor_dict.
5. one_hot_encoding: applied to classes tensor in tensor_dict.
6. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
same they can be merged into a single box with an associated k-hot class
label.
According to the proto file, you can choose among 4 different image resizers, namely
- keep_aspect_ratio_resizer
- fixed_shape_resizer
- identity_resizer
- conditional_shape_resizer
Here is a sample config file for model faster_rcnn_resnet101_pets
and the images are all reshaped with min_dimension=600 and max_dimension=1024
model {
faster_rcnn {
num_classes: 37
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
In fact, the shape of resized images has big influence in the detection speed vs accuracy performance. Although there is no specific requirements for the input image sizes, it is better to have all images with least dimension bigger than a reasonable value in order for the convolutional operation to work properly.
来源:https://stackoverflow.com/questions/56267652/is-it-required-to-have-predefined-image-size-to-use-transfer-learning-in-tensorf