本blog为github上CharlesShang/TFFRCNN版源码解析系列代码笔记
---------------个人学习笔记---------------
----------------本文作者吴疆--------------
------点击此处链接至博客园原文------
1.proposal_target_layer(rpn_rois, gt_boxes, gt_ishard, dontcare_areas, _num_classes)代码逻辑
赋值all_rois = rpn_rois,剔除gt_boxes中的gt_hardboxes得到gt_easyboxes--->
扩充all_rois(None,5) 第1列为全0batch_ind:rpn_rois+gt_easyboxes+jittered_gt_boxes三个部分,jittered_gt_boxes由gt_easyboxes抖动而来(调用_jitter_gt_boxes(...)函数,未知意义)未知扩充all_rois的意义???(Include ground-truth boxes in the set of candidate rois)猜想是参与训练的proposals中也应包含gt box,而不仅仅是来源于RPN得到的proposals,有利于RCNN subnet网络训练--->
计算rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images = 128/1 = 128 (RPN训练时是256个anchors,RCNN subnet训练时是128个proposals)--->
调用_sample_rois(...)函数得到labels(128*1), rois(128*5,第1列为全0batch_ind), 目标回归值bbox_targets(128*4k,K为类别数,默认PASCAL VOC数据集为21), bbox_inside_weights(128*4k)--->
新建变量bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)--->
返回rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights
# 返回采样的128proposals相关信息,即rois(128*5,第1列全0batch_ind);labels(128*1)对应gt类别标签,负样本proposals对应label为0; # bbox_targets(128*4K)proposals回归目标值,对应gt 类别位置有值,其余位置全0; # bbox_inside_weights对应gt 类别位置有值[1,1,1,1]其余位置全0; # bbox_outside_weights对应gt 类别位置有值[1,1,1,1]其余位置全0; def proposal_target_layer(rpn_rois, gt_boxes, gt_ishard, dontcare_areas, _num_classes): """ Assign object detection proposals to ground-truth targets. Produces proposal classification labels and bounding-box regression targets. Parameters ---------- rpn_rois: (1 x H x W x A, 5) [0, x1, y1, x2, y2] # RPN最终生成的rois数量少于1 x H x W x A 训练阶段2000个 测试阶段300个 gt_boxes: (G, 5) [x1 ,y1 ,x2, y2, class] int gt_ishard: (G, 1) {0 | 1} 1 indicates hard dontcare_areas: (D, 4) [ x1, y1, x2, y2] _num_classes ---------- Returns ---------- rois: (1 x H x W x A, 5) [0, x1, y1, x2, y2] labels: (1 x H x W x A, 1) {0,1,...,_num_classes-1} bbox_targets: (1 x H x W x A, K x4) [dx1, dy1, dx2, dy2] bbox_inside_weights: (1 x H x W x A, Kx4) 0, 1 masks for the computing loss bbox_outside_weights: (1 x H x W x A, Kx4) 0, 1 masks for the computing loss """ # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN # (i.e., rpn.proposal_layer.ProposalLayer), or any other source all_rois = rpn_rois # TODO(rbg): it's annoying that sometimes I have extra info before # and other times after box coordinates -- normalize to one format # Include ground-truth boxes in the set of candidate rois # 默认TRAIN.PRECLUDE_HARD_SAMPLES = True if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0: assert gt_ishard.shape[0] == gt_boxes.shape[0] gt_ishard = gt_ishard.astype(int) # 剔除gt_ishard box得到gt_easyboxes,怎么和anchor_target_layer_tf.py中处理不一样??? gt_easyboxes = gt_boxes[gt_ishard != 1, :] else: gt_easyboxes = gt_boxes """ add the ground-truth to rois will cause zero loss! not good for visuallization """ jittered_gt_boxes = _jitter_gt_boxes(gt_easyboxes) zeros = np.zeros((gt_easyboxes.shape[0] * 2, 1), dtype=gt_easyboxes.dtype) # 由all_rois、含batch_ind为0的gt_easyboxes、jittered_gt_boxes组成all_rois??? # all_rois的意义何在??? all_rois = np.vstack((all_rois, \ np.hstack((zeros, np.vstack((gt_easyboxes[:, :-1], jittered_gt_boxes[:, :-1])))))) # batch_ind均必须为0!!! # Sanity check: single batch only assert np.all(all_rois[:, 0] == 0), \ 'Only single item batches are supported' num_images = 1 # 默认TRAIN.BATCH_SIZE = 128,与TRAIN.RPN_BATCHSIZE = 256有区别!!! rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images # 默认TRAIN.FG_FRACTION = 0.25(1:3),与TRAIN.RPN_FG_FRACTION = 0.5(1:1)有区别!!! fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)) # Sample rois with classification labels and bounding box regression targets labels, rois, bbox_targets, bbox_inside_weights = _sample_rois( all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image, rois_per_image, _num_classes) # _count = 1 # if DEBUG: # if _count == 1: # _fg_num, _bg_num = 0, 0 # print 'num fg: {}'.format((labels > 0).sum()) # print 'num bg: {}'.format((labels == 0).sum()) # _count += 1 # _fg_num += (labels > 0).sum() # _bg_num += (labels == 0).sum() # print 'num fg avg: {}'.format(_fg_num / _count) # print 'num bg avg: {}'.format(_bg_num / _count) # print 'ratio: {:.3f}'.format(float(_fg_num) / float(_bg_num)) rois = rois.reshape(-1, 5) labels = labels.reshape(-1, 1) bbox_targets = bbox_targets.reshape(-1, _num_classes*4) bbox_inside_weights = bbox_inside_weights.reshape(-1, _num_classes*4) bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32) return rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights
2._sample_rois(all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image, rois_per_image, num_classes)代码逻辑
调用bbox_overlaps(...)函数(utils/cython_bbox.so中)计算all_rois[:,1:5]和gt_boxes[:, :4]的IOU--->
对于all_rois中各个roi,得到与gt boxes获得max IOU对应的gt box索引gt_assignment,并得到对应max IOU值max_overlaps,利用gt_assignment得到各roi的gt类别标签labels--->
剔除难例:调用bbox_overlaps(...)函数(utils/cython_bbox.so中)计算all_rois[:,1:5]和gt_hardboxes[:, :4]的IOU,对于all_rois中各个roi,与gt_hardboxes获得max IOU值>0.5将被剔除--->
剔除dontcare areas:调用bbox_intersections(...)函数(utils/cython_bbox.so中)计算dontcare_areas和all_rois[:,1:5]的交集,对于all_rois的各个roi,其与所有dontcare_areas交集和>0.5将被剔除--->
正样本proposals采样:与gt box的max IOU > 0.5的proposals为正样本,随机采样32个proposals,不足32个以负样本补足(正、负样本比例为1:3,样本总数为128)--->
负样本proposals采样:与gt box的max IOU介于(0.1,0.5)的proposals为负样本,随机采样96个proposals---> (感觉采样会取到gt_boxes本身?还是不明白all_rois的用意)
更新labels,其shape为(128,1),负样本proposals对应label置为0,更新rois,其shape为(128,5)--->
调用_compute_targets(...)函数计算规范化后的proposals回归目标值bbox_target_data,其shape为(128,5),第1列为类别信息,第2—4列为proposals的规范化的回归目标值--->
调用_get_bbox_regression_labels(...)函数扩充128个proposals的bbox_target_data(128*5,第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值,其余全0,建立bbox_inside_weights (128*(4K))对应类别位置值为1.0 1.0 1.0 1.0,其余全0--->
返回labels(128*1), rois(128*5,第1列为全0batch_ind), 目标回归值bbox_targets(128*4k,K为类别数,默认PASCAL VOC数据集为21), bbox_inside_weights(128*4k),被proposal_target_layer(...)调用
# 得到(默认128)个正、负样本proposals采样 rois、对应gt类别标签labels和回归目标值bbox_targets、bbox_inside_weights def _sample_rois(all_rois, gt_boxes, gt_ishard, dontcare_areas, fg_rois_per_image, rois_per_image, num_classes): """ Generate a random sample of RoIs comprising foreground and background examples. """ # overlaps: R x G,R表示all_rois中roi的数量,G表示gt_box的数量 overlaps = bbox_overlaps( np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float), np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float)) # 对于all_rois中各个roi,与gt boxes获得max IOU对应的gt box索引 gt_assignment = overlaps.argmax(axis=1) # R # 对应的max IOU值 max_overlaps = overlaps.max(axis=1) # R # 对应的类别label labels = gt_boxes[gt_assignment, 4] # 剔除难例 # preclude hard samples ignore_inds = np.empty(shape=(0), dtype=int) # 默认TRAIN.PRECLUDE_HARD_SAMPLES = True if cfg.TRAIN.PRECLUDE_HARD_SAMPLES and gt_ishard is not None and gt_ishard.shape[0] > 0: gt_ishard = gt_ishard.astype(int) gt_hardboxes = gt_boxes[gt_ishard == 1, :] if gt_hardboxes.shape[0] > 0: # R x H hard_overlaps = bbox_overlaps( np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float), np.ascontiguousarray(gt_hardboxes[:, :4], dtype=np.float)) # 对于all_rois中各个roi,与gt_hardboxes获得max IOU值 hard_max_overlaps = hard_overlaps.max(axis=1) # R # hard_gt_assignment = hard_overlaps.argmax(axis=0) # H # 默认TRAIN.FG_THRESH = 0.5 ignore_inds = np.append(ignore_inds, \ np.where(hard_max_overlaps >= cfg.TRAIN.FG_THRESH)[0]) # if DEBUG: # if ignore_inds.size > 1: # print 'num hard: {:d}:'.format(ignore_inds.size) # print 'hard box:', gt_hardboxes # print 'rois: ' # print all_rois[ignore_inds] # 剔除dontcare areas # preclude dontcare areas if dontcare_areas is not None and dontcare_areas.shape[0] > 0: # intersec shape is D x R intersecs = bbox_intersections( np.ascontiguousarray(dontcare_areas, dtype=np.float), # D x 4 np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float) # R x 4 ) # 对于all_rois的各个roi,计算其与所有dontcare_areas交集和 intersecs_sum = intersecs.sum(axis=0) # R x 1 # 默认TRAIN.DONTCARE_AREA_INTERSECTION_HI = 0.5 ignore_inds = np.append(ignore_inds, \ np.where(intersecs_sum > cfg.TRAIN.DONTCARE_AREA_INTERSECTION_HI)[0]) # if ignore_inds.size >= 1: # print 'num dontcare: {:d}:'.format(ignore_inds.size) # print 'dontcare box:', dontcare_areas.astype(int) # print 'rois: ' # print all_rois[ignore_inds].astype(int) # Select foreground RoIs as those with >= FG_THRESH overlap # 默认TRAIN.FG_THRESH = 0.5 # max_overlaps:对于all_rois中各个roi,与gt boxes获得max IOU值 # 与gt box的max IOU > 0.5的proposals为正样本 fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0] # np.setdiff1d()函数返回存在于fg_inds但不存在于ignore_inds的元素组成的元组 fg_inds = np.setdiff1d(fg_inds, ignore_inds) # Guard against the case when an image has fewer than fg_rois_per_image # foreground RoIs # 默认fg_rois_per_image = 128 * 0.25 = 32 !!! fg_rois_per_this_image = min(fg_rois_per_image, fg_inds.size) # Sample foreground regions without replacement # 前景(正样本)proposal采样!!! if fg_inds.size > 0: fg_inds = npr.choice(fg_inds, size=fg_rois_per_this_image, replace=False) # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) # 默认TRAIN.BG_THRESH_HI = 0.5、TRAIN.BG_THRESH_LO = 0.1 # 与gt box的max IOU介于(0.1, 0.5)的proposals为负样本 bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) & (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] bg_inds = np.setdiff1d(bg_inds, ignore_inds) # Compute number of background RoIs to take from this image (guarding # against there being fewer than desired) # 默认rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images = 128/1 = 128 # 正样本proposals不足32个则以负样本proposals补足 bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size) # Sample background regions without replacement if bg_inds.size > 0: bg_inds = npr.choice(bg_inds, size=bg_rois_per_this_image, replace=False) # The indices that we're selecting (both fg and bg) keep_inds = np.append(fg_inds, bg_inds) # Select sampled values from various arrays: labels = labels[keep_inds] # Clamp labels for the background RoIs to 0 # 负样本proposals label置为0 labels[fg_rois_per_this_image:] = 0 # 感觉会取到gt_boxes本身??????还是不明白all_rois的用意 # 采样正、负样本proposals共128个 rois = all_rois[keep_inds] # gt_assignment:对于all_rois中各个roi,与gt boxes获得max IOU对应的gt box索引 # bbox_target_data.shape = (128, 5) 第1列为类别信息,第2—4列为proposals的规范化的回归目标值 bbox_target_data = _compute_targets( rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels) # bbox_target_data (1 x H x W x A, 5) # bbox_targets <- (1 x H x W x A, K x 4) # bbox_inside_weights <- (1 x H x W x A, K x 4) bbox_targets, bbox_inside_weights = \ _get_bbox_regression_labels(bbox_target_data, num_classes) # labels:128 * 1 # rois: 128 * 5 第一列为全0batch_ind # bbox_targets: 128 * (4K) K表示类别 # bbox_inside_weights: 128 * (4K) K表示类别 return labels, rois, bbox_targets, bbox_inside_weights
3._get_bbox_regression_label(bbox_target_data,num_classes)
扩充128个proposals的bbox_target_data(128*5,第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值,其余全0,建立bbox_inside_weights (128*(4K))对应类别位置值为1.0 1.0 1.0 1.0,其余全0
# 扩充128个proposals的bbox_target_data(128*5,第1类为对应的gt类别) to bbox_target (128*(4K)) 对应类别位置为其回归目标值,其余全0 # 建立bbox_inside_weights (128*(4K))对应类别位置值为1.0 1.0 1.0 1.0,其余全0 def _get_bbox_regression_labels(bbox_target_data, num_classes): """ Bounding-box regression targets (bbox_target_data) are stored in a compact form N x (class, tx, ty, tw, th) This function expands those targets into the 4-of-4*K representation used by the network (i.e. only one class has non-zero targets). Returns: bbox_target (ndarray): N x 4K blob of regression targets bbox_inside_weights (ndarray): N x 4K blob of loss weights """ # 各个proposal对应的gt 类别 clss = bbox_target_data[:, 0] bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32) bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32) # 取出gt 类别非0的proposal的索引 inds = np.where(clss > 0)[0] for ind in inds: cls = int(clss[ind]) start = 4 * cls end = start + 4 bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] # 默认TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0) bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS return bbox_targets, bbox_inside_weights
4._compute_targets(ex_rois,gt_rois,labels)
由proposals(即rois[:, 1:5])和对应gt_box计算proposals的目标回归值,并利用cfg.TRAIN.BBOX_NORMALIZE_MEANS和cfg.TRAIN.BBOX_NORMALIZE_STDS对其规范化,返回的bbox_target_data.shape = (128, 5) 第一列为类别信息,第2—4列为proposals的规范化的回归目标值,被_sample_rois(...)函数调用
def _compute_targets(ex_rois, gt_rois, labels): """Compute bounding-box regression targets for an image.""" assert ex_rois.shape[0] == gt_rois.shape[0] assert ex_rois.shape[1] == 4 assert gt_rois.shape[1] == 4 targets = bbox_transform(ex_rois, gt_rois) # TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = True # TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0)、TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)!!! # 利用cfg.TRAIN.BBOX_NORMALIZE_MEANS和cfg.TRAIN.BBOX_NORMALIZE_STDS对proposals回归目标值进行规范化!!! if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED: # Optionally normalize targets by a precomputed mean and stdev targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS)) / np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS)) # 返回的bbox_target_data.shape = (128, 5) 第一列为类别信息,第2—4列为proposals的规范化的回归目标值 return np.hstack( (labels[:, np.newaxis], targets)).astype(np.float32, copy=False)
5._jitter_gt_boxes(gt_boxes,jitter=0.05)
传入参数为gt_easyboxes,为其左上、右下坐标添加偏置,横坐标添加基于宽度的偏置、纵坐标添加基于高度的偏置,抖动系数jitter=0.05,未知意义,被proposal_target_layer(...)调用
# 抖动、传入参数gt_easyboxes、抖动系数jitter=0.05 # 为gt_easyboxes左上、右下坐标添加偏置,横坐标添加基于宽度的偏置、纵坐标添加基于高度的偏置 def _jitter_gt_boxes(gt_boxes, jitter=0.05): """ jitter the gtboxes, before adding them into rois, to be more robust for cls and rgs gt_boxes: (G, 5) [x1 ,y1 ,x2, y2, class] int """ jittered_boxes = gt_boxes.copy() ws = jittered_boxes[:, 2] - jittered_boxes[:, 0] + 1.0 hs = jittered_boxes[:, 3] - jittered_boxes[:, 1] + 1.0 width_offset = (np.random.rand(jittered_boxes.shape[0]) - 0.5) * jitter * ws height_offset = (np.random.rand(jittered_boxes.shape[0]) - 0.5) * jitter * hs jittered_boxes[:, 0] += width_offset jittered_boxes[:, 2] += width_offset jittered_boxes[:, 1] += height_offset jittered_boxes[:, 3] += height_offset return jittered_boxes