Best strategy to reduce false positives: Google's new Object Detection API on Satellite Imagery

前端 未结 2 1468
情深已故
情深已故 2021-01-31 09:44

I\'m setting up the new Tensorflow Object Detection API to find small objects in large areas of satellite imagery. It works quite well - it finds all 10 objects I want, but I al

相关标签:
2条回答
  • 2021-01-31 10:15

    I've revisited this topic recently in my work and thought I'd update with my current learnings for any who visit in the future.

    The topic appeared on Tensorflow's Models repo issue tracker. SSD allows you to set the ratio of how many negative:postive examples to mine (max_negatives_per_positive: 3), but you can also set a minimum number for images with no postives (min_negatives_per_image: 3). Both of these are defined in the model-ssd-loss config section.

    That said, I don't see the same option in Faster-RCNN's model configuration. It's mentioned in the issue that models/research/object_detection/core/balanced_positive_negative_sampler.py contains the code used for Faster-RCNN.

    One other option discussed in the issue is creating a second class specifically for lookalikes. During training, the model will attempt to learn class differences which should help serve your purpose.

    Lastly, I came across this article on Filter Amplifier Networks (FAN) that may be informative for your work on aerial imagery.

    ===================================================================

    The following paper describes hard negative mining for the same purpose you describe: Training Region-based Object Detectors with Online Hard Example Mining

    In section 3.1 they describe using a foreground and background class:

    Background RoIs. A region is labeled background (bg) if its maximum IoU with ground truth is in the interval [bg lo, 0.5). A lower threshold of bg lo = 0.1 is used by both FRCN and SPPnet, and is hypothesized in [14] to crudely approximate hard negative mining; the assumption is that regions with some overlap with the ground truth are more likely to be the confusing or hard ones. We show in Section 5.4 that although this heuristic helps convergence and detection accuracy, it is suboptimal because it ignores some infrequent, but important, difficult background regions. Our method removes the bg lo threshold.

    In fact this paper is referenced and its ideas are used in Tensorflow's object detection losses.py code for hard mining:

    class HardExampleMiner(object):
    """Hard example mining for regions in a list of images.
    Implements hard example mining to select a subset of regions to be
    back-propagated. For each image, selects the regions with highest losses,
    subject to the condition that a newly selected region cannot have
    an IOU > iou_threshold with any of the previously selected regions.
    This can be achieved by re-using a greedy non-maximum suppression algorithm.
    A constraint on the number of negatives mined per positive region can also be
    enforced.
    Reference papers: "Training Region-based Object Detectors with Online
    Hard Example Mining" (CVPR 2016) by Srivastava et al., and
    "SSD: Single Shot MultiBox Detector" (ECCV 2016) by Liu et al.
    """
    

    Based on your model config file, the HardMinerObject is returned by losses_builder.py in this bit of code:

    def build_hard_example_miner(config,
                                classification_weight,
                                localization_weight):
    """Builds hard example miner based on the config.
    Args:
        config: A losses_pb2.HardExampleMiner object.
        classification_weight: Classification loss weight.
        localization_weight: Localization loss weight.
    Returns:
        Hard example miner.
    """
    loss_type = None
    if config.loss_type == losses_pb2.HardExampleMiner.BOTH:
        loss_type = 'both'
    if config.loss_type == losses_pb2.HardExampleMiner.CLASSIFICATION:
        loss_type = 'cls'
    if config.loss_type == losses_pb2.HardExampleMiner.LOCALIZATION:
        loss_type = 'loc'
    
    max_negatives_per_positive = None
    num_hard_examples = None
    if config.max_negatives_per_positive > 0:
        max_negatives_per_positive = config.max_negatives_per_positive
    if config.num_hard_examples > 0:
        num_hard_examples = config.num_hard_examples
    hard_example_miner = losses.HardExampleMiner(
        num_hard_examples=num_hard_examples,
        iou_threshold=config.iou_threshold,
        loss_type=loss_type,
        cls_loss_weight=classification_weight,
        loc_loss_weight=localization_weight,
        max_negatives_per_positive=max_negatives_per_positive,
        min_negatives_per_image=config.min_negatives_per_image)
    return hard_example_miner
    

    which is returned by model_builder.py and called by train.py. So basically, it seems to me that simply generating your true positive labels (with a tool like LabelImg or RectLabel) should be enough for the train algorithm to find hard negatives within the same images. The related question gives an excellent walkthrough.

    In the event you want to feed in data that has no true positives (i.e. nothing should be classified in the image), just add the negative image to your tfrecord with no bounding boxes.

    0 讨论(0)
  • 2021-01-31 10:19

    I think I was passing through the same or close scenario and it's worth it to share with you.

    I managed to solve it by passing images without annotations to the trainer.

    On my scenario I'm building a project to detect assembly failures from my client's products, at real time. I successfully achieved very robust results (for production env) by using detection+classification for components that has explicity a negative pattern (e.g. a screw that has screw on/off(just the hole)) and only detection for things that doesn't has the negative pattens (e.g. a tape that can be placed anywhere).

    On the system it's mandatory that the user record 2 videos, one containing the positive scenario and another containing the negative (or the n videos, containing n patterns of positive and negative so the algorithm can generalize).

    After a while testing I found out that if I register to detected only tape the detector was giving very confident (0.999) false positive detections of tape. It was learning the pattern where the tape was inserted instead of the tape itself. When I had another component (like a screw on it's negative format) I was passing the negative pattern of tape without being explicitly aware of it, so the FPs didn't happen.

    So I found out that, in this scenario, I had to necessarily pass the images without tape so it could differentiate between tape and no-tape.

    I considered two alternatives to experiment and try to solve this behavior:

    1. Train passing an considerable amount of images that doesn't has any annotation (10% of all my negative samples) along with all images that I have real annotations.
    2. On the images that I don't have annotation I create a dummy annotation with a dummy label so I could force the detector to train with that image (thus learning the no-tape patttern). Later on, when get the dummy predictions, just ignore them.

    Concluded that both alternatives worked perfectly on my scenario. The training loss got a little messy but the predictions work with robustness for my very controlled scenario (the system's camera has its own box and illumination to decrease variables).

    I had to make two little modifications for the first alternative to work:

    1. All images that didn't had any annotation I passed a dummy annotation (class=None, xmin/ymin/xmax/ymax=-1)
    2. When generating the tfrecord files I use this information (xmin == -1, in this case) to add an empty list for the sample:
    def create_tf_example(group, path, label_map):
        with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
            encoded_jpg = fid.read()
        encoded_jpg_io = io.BytesIO(encoded_jpg)
        image = Image.open(encoded_jpg_io)
        width, height = image.size
    
        filename = group.filename.encode('utf8')
        image_format = b'jpg'
    
        xmins = []
        xmaxs = []
        ymins = []
        ymaxs = []
        classes_text = []
        classes = []
    
        for index, row in group.object.iterrows():
            if not pd.isnull(row.xmin):
                if not row.xmin == -1:
                    xmins.append(row['xmin'] / width)
                    xmaxs.append(row['xmax'] / width)
                    ymins.append(row['ymin'] / height)
                    ymaxs.append(row['ymax'] / height)
                    classes_text.append(row['class'].encode('utf8'))
                    classes.append(label_map[row['class']])
    
        tf_example = tf.train.Example(features=tf.train.Features(feature={
            'image/height': dataset_util.int64_feature(height),
            'image/width': dataset_util.int64_feature(width),
            'image/filename': dataset_util.bytes_feature(filename),
            'image/source_id': dataset_util.bytes_feature(filename),
            'image/encoded': dataset_util.bytes_feature(encoded_jpg),
            'image/format': dataset_util.bytes_feature(image_format),
            'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
            'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
            'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
            'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
            'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
            'image/object/class/label': dataset_util.int64_list_feature(classes),
        }))
        return tf_example
    

    Part of the traning progress:

    Currently I'm using tensorflow object detection along with tensorflow==1.15, using faster_rcnn_resnet101_coco.config.

    Hope it will solve someone's problem as I didn't found any solution on the internet. I read a lot of people telling that faster_rcnn is not adapted for negative training for FPs reduction but my tests proved the opposite.

    0 讨论(0)
提交回复
热议问题