Tensorflow版Faster RCNN源码解析（TFFRCNN）（19） rpn_msr/proposal_target_layer_tf.py

本blog为github上CharlesShang/TFFRCNN版源码解析系列代码笔记

---------------个人学习笔记---------------

----------------本文作者吴疆--------------

1.proposal_target_layer(rpn_rois, gt_boxes, gt_ishard, dontcare_areas, _num_classes)代码逻辑

赋值all_rois = rpn_rois，剔除gt_boxes中的gt_hardboxes得到gt_easyboxes--->

扩充all_rois(None,5) 第1列为全0batch_ind：rpn_rois+gt_easyboxes+jittered_gt_boxes三个部分，jittered_gt_boxes由gt_easyboxes抖动而来（调用_jitter_gt_boxes(...)函数，未知意义）未知扩充all_rois的意义？？？（Include ground-truth boxes in the set of candidate rois）猜想是参与训练的proposals中也应包含gt box，而不仅仅是来源于RPN得到的proposals，有利于RCNN subnet网络训练--->

计算rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images = 128/1 = 128 (RPN训练时是256个anchors，RCNN subnet训练时是128个proposals)--->

调用_sample_rois(...)函数得到labels(128*1), rois(128*5,第1列为全0batch_ind), 目标回归值bbox_targets(128*4k,K为类别数，默认PASCAL VOC数据集为21), bbox_inside_weights(128*4k)--->

新建变量bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)--->

返回rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights

# 返回采样的128proposals相关信息，即rois(128*5,第1列全0batch_ind)；labels(128*1)对应gt类别标签，负样本proposals对应label为0； 
# bbox_targets(128*4K)proposals回归目标值，对应gt 类别位置有值，其余位置全0；
# bbox_inside_weights对应gt 类别位置有值[1,1,1,1]其余位置全0；
# bbox_outside_weights对应gt 类别位置有值[1,1,1,1]其余位置全0；
def proposal_target_layer(rpn_rois, gt_boxes, gt_ishard, dontcare_areas, _num_classes):
    """
    Assign object detection proposals to ground-truth targets. Produces proposal
    classification labels and bounding-box regression targets.
    Parameters
    ----------
    rpn_rois:  (1 x H x W x A, 5) [0, x1, y1, x2, y2]  # RPN最终生成的rois数量少于1 x H x W x A 训练阶段2000个 测试阶段300个
    gt_boxes: (G, 5) [x1 ,y1 ,x2, y2, class] int
    gt_ishard: (G, 1) {0 | 1} 1 indicates hard
    dontcare_areas: (D, 4) [ x1, y1, x2, y2]
    _num_classes
    ----------
    Returns
    ----------
    rois: (1 x H x W x A, 5) [0, x1, y1, x2, y2]
    labels: (1 x H x W x A, 1) {0,1,...,_num_classes-1}
    bbox_targets: (1 x H x W x A, K x4) [dx1, dy1, dx2, dy2]
    bbox_inside_weights: (1 x H x W x A, Kx4) 0, 1 masks for the computing loss
    bbox_outside_weights: (1 x H x W x A, Kx4) 0, 1 masks for the computing loss
    """

    # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
    # (i.e., rpn.proposal_layer.ProposalLayer), or any other source
    all_rois = rpn_rois
    # TODO(rbg): it's annoying that sometimes I have extra info before
    # and other times after box coordinates -- normalize to one format

    # Include ground-truth boxes in the set of candidate rois
    # 默认TRAIN.PRECLUDE_HARD_SAMPLES = True
    if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0:
        assert gt_ishard.shape[0] == gt_boxes.shape[0]
        gt_ishard = gt_ishard.astype(int)
        # 剔除gt_ishard box得到gt_easyboxes，怎么和anchor_target_layer_tf.py中处理不一样？？？
        gt_easyboxes = gt_boxes[gt_ishard != 1, :]
    else:
        gt_easyboxes = gt_boxes

    """
    add the ground-truth to rois will cause zero loss! not good for visuallization
    """
    jittered_gt_boxes = _jitter_gt_boxes(gt_easyboxes)
    zeros = np.zeros((gt_easyboxes.shape[0] * 2, 1), dtype=gt_easyboxes.dtype)
    # 由all_rois、含batch_ind为0的gt_easyboxes、jittered_gt_boxes组成all_rois？？？
    # all_rois的意义何在？？？
    all_rois = np.vstack((all_rois, \
         np.hstack((zeros, np.vstack((gt_easyboxes[:, :-1], jittered_gt_boxes[:, :-1]))))))

    # batch_ind均必须为0！！！
    # Sanity check: single batch only
    assert np.all(all_rois[:, 0] == 0), \
            'Only single item batches are supported'

    num_images = 1
    # 默认TRAIN.BATCH_SIZE = 128，与TRAIN.RPN_BATCHSIZE = 256有区别！！！
    rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
    # 默认TRAIN.FG_FRACTION = 0.25（1:3），与TRAIN.RPN_FG_FRACTION = 0.5（1:1）有区别！！！
    fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image))

    # Sample rois with classification labels and bounding box regression targets
    labels, rois, bbox_targets, bbox_inside_weights = _sample_rois(
        all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image,
        rois_per_image, _num_classes)

    # _count = 1
    # if DEBUG:
    #     if _count == 1:
    #         _fg_num, _bg_num = 0, 0
    #     print 'num fg: {}'.format((labels > 0).sum())
    #     print 'num bg: {}'.format((labels == 0).sum())
    #     _count += 1
    #     _fg_num += (labels > 0).sum()
    #     _bg_num += (labels == 0).sum()
    #     print 'num fg avg: {}'.format(_fg_num / _count)
    #     print 'num bg avg: {}'.format(_bg_num / _count)
    #     print 'ratio: {:.3f}'.format(float(_fg_num) / float(_bg_num))

    rois = rois.reshape(-1, 5)
    labels = labels.reshape(-1, 1)
    bbox_targets = bbox_targets.reshape(-1, _num_classes*4)
    bbox_inside_weights = bbox_inside_weights.reshape(-1, _num_classes*4)
    bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)
    return rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights

2._sample_rois(all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image, rois_per_image, num_classes)代码逻辑

调用bbox_overlaps(...)函数（utils/cython_bbox.so中）计算all_rois[:，1:5]和gt_boxes[:, :4]的IOU--->

对于all_rois中各个roi，得到与gt boxes获得max IOU对应的gt box索引gt_assignment，并得到对应max IOU值max_overlaps，利用gt_assignment得到各roi的gt类别标签labels--->

剔除难例：调用bbox_overlaps(...)函数（utils/cython_bbox.so中）计算all_rois[:，1:5]和gt_hardboxes[:, :4]的IOU，对于all_rois中各个roi，与gt_hardboxes获得max IOU值>0.5将被剔除--->

剔除dontcare areas：调用bbox_intersections(...)函数（utils/cython_bbox.so中）计算dontcare_areas和all_rois[:，1:5]的交集，对于all_rois的各个roi,其与所有dontcare_areas交集和>0.5将被剔除--->

正样本proposals采样：与gt box的max IOU > 0.5的proposals为正样本，随机采样32个proposals，不足32个以负样本补足（正、负样本比例为1:3，样本总数为128）--->

负样本proposals采样：与gt box的max IOU介于（0.1,0.5）的proposals为负样本，随机采样96个proposals---> （感觉采样会取到gt_boxes本身？还是不明白all_rois的用意）

更新labels，其shape为(128,1)，负样本proposals对应label置为0，更新rois，其shape为(128,5)--->

调用_compute_targets(...)函数计算规范化后的proposals回归目标值bbox_target_data，其shape为(128,5)，第1列为类别信息，第2—4列为proposals的规范化的回归目标值--->

调用_get_bbox_regression_labels(...)函数扩充128个proposals的bbox_target_data(128*5，第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值，其余全0，建立bbox_inside_weights （128*(4K)）对应类别位置值为1.0 1.0 1.0 1.0，其余全0--->

返回labels(128*1), rois(128*5,第1列为全0batch_ind), 目标回归值bbox_targets(128*4k,K为类别数，默认PASCAL VOC数据集为21), bbox_inside_weights(128*4k)，被proposal_target_layer(...)调用

# 得到（默认128）个正、负样本proposals采样 rois、对应gt类别标签labels和回归目标值bbox_targets、bbox_inside_weights
def _sample_rois(all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image, rois_per_image, num_classes):
    """
    Generate a random sample of RoIs comprising foreground and background examples.
    """
    # overlaps: R x G，R表示all_rois中roi的数量，G表示gt_box的数量
    overlaps = bbox_overlaps(
        np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
        np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))
    # 对于all_rois中各个roi，与gt boxes获得max IOU对应的gt box索引
    gt_assignment = overlaps.argmax(axis=1)  # R
    # 对应的max IOU值
    max_overlaps = overlaps.max(axis=1)  # R
    # 对应的类别label
    labels = gt_boxes[gt_assignment, 4]

    # 剔除难例
    # preclude hard samples
    ignore_inds = np.empty(shape=(0), dtype=int)
    # 默认TRAIN.PRECLUDE_HARD_SAMPLES = True
    if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0:
        gt_ishard = gt_ishard.astype(int)
        gt_hardboxes = gt_boxes[gt_ishard == 1, :]
        if gt_hardboxes.shape[0] > 0:
            # R x H
            hard_overlaps = bbox_overlaps(
                np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
                np.ascontiguousarray(gt_hardboxes[:, :4], dtype=np.float))
            # 对于all_rois中各个roi，与gt_hardboxes获得max IOU值
            hard_max_overlaps = hard_overlaps.max(axis=1)  # R
            # hard_gt_assignment = hard_overlaps.argmax(axis=0)  # H
            # 默认TRAIN.FG_THRESH = 0.5
            ignore_inds = np.append(ignore_inds, \
                                    np.where(hard_max_overlaps >= cfg.TRAIN.FG_THRESH)[0])
            # if DEBUG:
            #     if ignore_inds.size > 1:
            #         print 'num hard: {:d}:'.format(ignore_inds.size)
            #         print 'hard box:', gt_hardboxes
            #         print 'rois: '
            #         print all_rois[ignore_inds]

    # 剔除dontcare areas
    # preclude dontcare areas
    if dontcare_areas is not None and dontcare_areas.shape[0] > 0:
        # intersec shape is D x R
        intersecs = bbox_intersections(
            np.ascontiguousarray(dontcare_areas, dtype=np.float),  # D x 4
            np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float)  # R x 4
        )
        # 对于all_rois的各个roi,计算其与所有dontcare_areas交集和
        intersecs_sum = intersecs.sum(axis=0)  # R x 1
        # 默认TRAIN.DONTCARE_AREA_INTERSECTION_HI = 0.5
        ignore_inds = np.append(ignore_inds, \
                                np.where(intersecs_sum > cfg.TRAIN.DONTCARE_AREA_INTERSECTION_HI)[0])
        # if ignore_inds.size >= 1:
        #     print 'num dontcare: {:d}:'.format(ignore_inds.size)
        #     print 'dontcare box:', dontcare_areas.astype(int)
        #     print 'rois: '
        #     print all_rois[ignore_inds].astype(int)

    # Select foreground RoIs as those with >= FG_THRESH overlap
    # 默认TRAIN.FG_THRESH = 0.5
    # max_overlaps：对于all_rois中各个roi，与gt boxes获得max IOU值
    # 与gt box的max IOU > 0.5的proposals为正样本
    fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
    # np.setdiff1d()函数返回存在于fg_inds但不存在于ignore_inds的元素组成的元组
    fg_inds = np.setdiff1d(fg_inds, ignore_inds)
    # Guard against the case when an image has fewer than fg_rois_per_image
    # foreground RoIs
    # 默认fg_rois_per_image = 128 * 0.25 = 32 ！！！
    fg_rois_per_this_image = min(fg_rois_per_image, fg_inds.size)
    # Sample foreground regions without replacement
    # 前景（正样本）proposal采样！！！
    if fg_inds.size > 0:
        fg_inds = npr.choice(fg_inds, size=fg_rois_per_this_image, replace=False)
    # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
    # 默认TRAIN.BG_THRESH_HI = 0.5、TRAIN.BG_THRESH_LO = 0.1
    # 与gt box的max IOU介于（0.1, 0.5）的proposals为负样本
    bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
                       (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
    bg_inds = np.setdiff1d(bg_inds, ignore_inds)
    # Compute number of background RoIs to take from this image (guarding
    # against there being fewer than desired)
    # 默认rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images = 128/1 = 128
    # 正样本proposals不足32个则以负样本proposals补足
    bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
    bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)
    # Sample background regions without replacement
    if bg_inds.size > 0:
        bg_inds = npr.choice(bg_inds, size=bg_rois_per_this_image, replace=False)

    # The indices that we're selecting (both fg and bg)
    keep_inds = np.append(fg_inds, bg_inds)
    # Select sampled values from various arrays:
    labels = labels[keep_inds]
    # Clamp labels for the background RoIs to 0
    # 负样本proposals label置为0
    labels[fg_rois_per_this_image:] = 0
    # 感觉会取到gt_boxes本身？？？？？？还是不明白all_rois的用意
    # 采样正、负样本proposals共128个
    rois = all_rois[keep_inds]
    # gt_assignment：对于all_rois中各个roi，与gt boxes获得max IOU对应的gt box索引
    # bbox_target_data.shape = (128, 5) 第1列为类别信息，第2—4列为proposals的规范化的回归目标值
    bbox_target_data = _compute_targets(
        rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)

    # bbox_target_data (1 x H x W x A, 5)
    # bbox_targets <- (1 x H x W x A, K x 4)
    # bbox_inside_weights <- (1 x H x W x A, K x 4)
    bbox_targets, bbox_inside_weights = \
        _get_bbox_regression_labels(bbox_target_data, num_classes)
    # labels：128 * 1
    # rois： 128 * 5  第一列为全0batch_ind
    # bbox_targets: 128 * (4K)  K表示类别
    # bbox_inside_weights： 128 * (4K)  K表示类别
    return labels, rois, bbox_targets, bbox_inside_weights

3._get_bbox_regression_label(bbox_target_data,num_classes)

扩充128个proposals的bbox_target_data(128*5，第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值，其余全0，建立bbox_inside_weights （128*(4K)）对应类别位置值为1.0 1.0 1.0 1.0，其余全0

# 扩充128个proposals的bbox_target_data(128*5，第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值，其余全0
# 建立bbox_inside_weights （128*(4K)）对应类别位置值为1.0 1.0 1.0 1.0，其余全0
def _get_bbox_regression_labels(bbox_target_data, num_classes):
    """
    Bounding-box regression targets (bbox_target_data) are stored in a compact form N x (class, tx, ty, tw, th)
    This function expands those targets into the 4-of-4*K representation used
    by the network (i.e. only one class has non-zero targets).
    Returns:
        bbox_target (ndarray): N x 4K blob of regression targets
        bbox_inside_weights (ndarray): N x 4K blob of loss weights
    """
    # 各个proposal对应的gt 类别
    clss = bbox_target_data[:, 0]
    bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
    bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
    # 取出gt 类别非0的proposal的索引
    inds = np.where(clss > 0)[0]
    for ind in inds:
        cls = int(clss[ind])
        start = 4 * cls
        end = start + 4
        bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
        # 默认TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
        bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
    return bbox_targets, bbox_inside_weights

4._compute_targets(ex_rois,gt_rois,labels)

由proposals（即rois[:, 1:5]）和对应gt_box计算proposals的目标回归值，并利用cfg.TRAIN.BBOX_NORMALIZE_MEANS和cfg.TRAIN.BBOX_NORMALIZE_STDS对其规范化，返回的bbox_target_data.shape = (128, 5) 第一列为类别信息，第2—4列为proposals的规范化的回归目标值，被_sample_rois(...)函数调用

def _compute_targets(ex_rois, gt_rois, labels):
    """Compute bounding-box regression targets for an image."""
    assert ex_rois.shape[0] == gt_rois.shape[0]
    assert ex_rois.shape[1] == 4
    assert gt_rois.shape[1] == 4
    targets = bbox_transform(ex_rois, gt_rois)
    # TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = True
    # TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0)、TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)！！！
    # 利用cfg.TRAIN.BBOX_NORMALIZE_MEANS和cfg.TRAIN.BBOX_NORMALIZE_STDS对proposals回归目标值进行规范化！！！
    if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
        # Optionally normalize targets by a precomputed mean and stdev
        targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
                / np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
    # 返回的bbox_target_data.shape = (128, 5) 第一列为类别信息，第2—4列为proposals的规范化的回归目标值
    return np.hstack(
            (labels[:, np.newaxis], targets)).astype(np.float32, copy=False)

5._jitter_gt_boxes(gt_boxes,jitter=0.05)

传入参数为gt_easyboxes，为其左上、右下坐标添加偏置，横坐标添加基于宽度的偏置、纵坐标添加基于高度的偏置，抖动系数jitter=0.05，未知意义，被proposal_target_layer(...)调用

# 抖动、传入参数gt_easyboxes、抖动系数jitter=0.05
# 为gt_easyboxes左上、右下坐标添加偏置，横坐标添加基于宽度的偏置、纵坐标添加基于高度的偏置
def _jitter_gt_boxes(gt_boxes, jitter=0.05):
    """
    jitter the gtboxes, before adding them into rois, to be more robust for cls and rgs
    gt_boxes: (G, 5) [x1 ,y1 ,x2, y2, class] int
    """
    jittered_boxes = gt_boxes.copy()
    ws = jittered_boxes[:, 2] - jittered_boxes[:, 0] + 1.0
    hs = jittered_boxes[:, 3] - jittered_boxes[:, 1] + 1.0
    width_offset = (np.random.rand(jittered_boxes.shape[0]) - 0.5) * jitter * ws
    height_offset = (np.random.rand(jittered_boxes.shape[0]) - 0.5) * jitter * hs
    jittered_boxes[:, 0] += width_offset
    jittered_boxes[:, 2] += width_offset
    jittered_boxes[:, 1] += height_offset
    jittered_boxes[:, 3] += height_offset
    return jittered_boxes

来源：https://www.cnblogs.com/deeplearning1314/p/11342959.html

标签

rcnn

bbox

tensorflow

Tensorflow版Faster RCNN源码解析（TFFRCNN） （19） rpn_msr/proposal_target_layer_tf.py

Tensorflow版Faster RCNN源码解析（TFFRCNN）（19） rpn_msr/proposal_target_layer_tf.py