What the impact of different dimension of image resizer when using default config of object detection api

前端未结

关注

 1  1221

攒了一身酷 2021-02-04 11:02

I was trying to use the object detection API of Tensorflow to train a model. And I was using the sample config of faster rcnn resnet101 (https://github.com/tensorflow/models/blo

1条回答

鱼传尺愫 (楼主)

2021-02-04 11:52

After some tests, I guess I find the answer. Please correct me if there is anything wrong.

In .config file:

image_resizer {
  keep_aspect_ratio_resizer {
    min_dimension: 600
    max_dimension: 1024
  }
}

According to the image resizer setting of 'object_detection/builders/image_resizer_builder.py'

if image_resizer_config.WhichOneof(
    'image_resizer_oneof') == 'keep_aspect_ratio_resizer':
  keep_aspect_ratio_config = image_resizer_config.keep_aspect_ratio_resizer
  if not (keep_aspect_ratio_config.min_dimension
          <= keep_aspect_ratio_config.max_dimension):
    raise ValueError('min_dimension > max_dimension')
  return functools.partial(
      preprocessor.resize_to_range,
      min_dimension=keep_aspect_ratio_config.min_dimension,
      max_dimension=keep_aspect_ratio_config.max_dimension)

Then it tries to use 'resize_to_range' function of 'object_detection/core/preprocessor.py'

  with tf.name_scope('ResizeToRange', values=[image, min_dimension]):
    image_shape = tf.shape(image)
    orig_height = tf.to_float(image_shape[0])
    orig_width = tf.to_float(image_shape[1])
    orig_min_dim = tf.minimum(orig_height, orig_width)

    # Calculates the larger of the possible sizes
    min_dimension = tf.constant(min_dimension, dtype=tf.float32)
    large_scale_factor = min_dimension / orig_min_dim
    # Scaling orig_(height|width) by large_scale_factor will make the smaller
    # dimension equal to min_dimension, save for floating point rounding errors.
    # For reasonably-sized images, taking the nearest integer will reliably
    # eliminate this error.
    large_height = tf.to_int32(tf.round(orig_height * large_scale_factor))
    large_width = tf.to_int32(tf.round(orig_width * large_scale_factor))
    large_size = tf.stack([large_height, large_width])

    if max_dimension:
      # Calculates the smaller of the possible sizes, use that if the larger
      # is too big.
      orig_max_dim = tf.maximum(orig_height, orig_width)
      max_dimension = tf.constant(max_dimension, dtype=tf.float32)
      small_scale_factor = max_dimension / orig_max_dim
      # Scaling orig_(height|width) by small_scale_factor will make the larger
      # dimension equal to max_dimension, save for floating point rounding
      # errors. For reasonably-sized images, taking the nearest integer will
      # reliably eliminate this error.
      small_height = tf.to_int32(tf.round(orig_height * small_scale_factor))
      small_width = tf.to_int32(tf.round(orig_width * small_scale_factor))
      small_size = tf.stack([small_height, small_width])

      new_size = tf.cond(
          tf.to_float(tf.reduce_max(large_size)) > max_dimension,
          lambda: small_size, lambda: large_size)
    else:
      new_size = large_size

    new_image = tf.image.resize_images(image, new_size,
                                       align_corners=align_corners)

From the above code, we can know if we have an image whose size is 800*1000. The size of final output image will be 600*750.

That is, this image resizer will always resize your input image according to the setting of 'min_dimension' and 'max_dimension'.

0 讨论(0)