Tuning first_stage_anchor_generator in faster rcnn model

前端 未结 2 1801
Happy的楠姐
Happy的楠姐 2021-02-06 07:21

I am trying to detect some very small object (~25x25 pixels) from large image (~ 2040, 1536 pixels) using faster rcnn model from object_detect_api from here https://github.com

相关标签:
2条回答
  • 2021-02-06 07:48

    As I believe proposal anchors are generated only for model types of Faster RCNN. In this file you have specified what parameters may be set for anchors generation within line you mentioned from config.

    I tried setting base_anchor_size, however I failed. Though this FasterRCNNTutorial tutorial mentions that:

    [...] you also need to configure the anchor sizes and aspect ratios in the .config file. The base anchor size is 255,255.

    The anchor ratios will multiply the x dimension and divide the y dimension, so if you have an aspect ratio of 0.5 your 255x255 anchor becomes 128x510. Each aspect ratio in the list is applied, then the results are multiplied by the scales. So the first step is to resize your images to the training/testing size, then manually check what the smallest and largest objects you expect are, and what the most extreme aspect ratios will be. Set up the config file with values that will cover these cases when the base anchor size is adjusted by the aspect ratios and multiplied by the scales.

    I think it's pretty straightforward. I also used this 'workaround'.

    0 讨论(0)
  • 2021-02-06 07:57

    I don't know the actual answer, but I suspect that the way Faster RCNN works in Tensorflow object detection is as follows:

    this article says: "Anchors play an important role in Faster R-CNN. An anchor is a box. In the default configuration of Faster R-CNN, there are 9 anchors at a position of an image. The following graph shows 9 anchors at the position (320, 320) of an image with size (600, 800)."

    and the author gives an image showing an overlap of boxes, those are the proposed regions that contain the object based on the "CNN" part of the "RCNN" model, next comes the "R" part of the "RCNN" model which is the region proposal. To do that, there is another neural network that is trained alongside the CNN to figure out the best fit box. There are a lot of "proposals" where an object could be based on all the boxes, but we still don't know where it is.

    This "region proposal" neural net's job is to find the correct region and it is trained based on the labels you provide with the coordinates of each object in the image.

    Looking at this file, I noticed:

    line 174:  heights = scales / ratio_sqrts * base_anchor_size[0]
    
    line 175:  widths = scales * ratio_sqrts * base_anchor_size[[1]]
    

    which seems to be the final goal of the configurations found in the config file(to generate a list of sliding windows with known widths and heights). While the base_anchor_size is created as a default of [256, 256]. In the comments the author of the code wrote:

    "For example, setting scales=[.1, .2, .2] and aspect ratios = [2,2,1/2] means that we create three boxes: one with scale .1, aspect ratio 2, one with scale .2, aspect ratio 2, and one with scale .2 and aspect ratio 1/2. Each box is multiplied by "base_anchor_size" before placing it over its respective center."

    which gives insight into how these boxes are created, the code seems to be creating a list of boxes based on the scales =[stuff] and aspect_ratios = [stuff] parameters that will be used to slide over the image. The scale is fairly straightforward and is how much the default square box of 256 by 256 should be scaled before it is used and the aspect ratio is the thing that changes the original square box into a rectangle that is more closer to the (scaled) shape of the objects you expect to encounter.

    Meaning, to optimally configure the scales and aspect ratios, you should find the "typical" sizes of the object in the image whatever it is ex(20 by 30, 5 by 10 ,etc) and figure out how much the default of 256 by 256 square box should be scaled to optimally fit that, then find the "typical" aspect ratios of your objects(according to google an aspect ratio is: the ratio of the width to the height of an image or screen.) and set those as your aspect ratio parameters.

    Note: it seems that the number of elements in the scales and aspect_ratios lists in the config file should be the same but I don't know for sure.

    Also I am not sure about how to find the optimal stride, but if your objects are smaller than 16 by 16 pixels the sliding window you created by setting the scales and aspect ratios to what you want might just skip your object altogether.

    0 讨论(0)
提交回复
热议问题