Fast R-CNN

匿名 (未验证) 提交于 2019-12-03 00:21:02



R-CNN is slow because it performs a ConvNet forward pass for each object proposal, without sharing computation. Spatial pyramid pooling networks (SPPnets) were proposed to speed up R-CNN by sharing computation. The SPPnet method computes a convolutional feature map for the entire input image and then classifies each object proposal using a feature vector extracted from the shared feature map. Features are extracted for a proposal by maxpooling the portion of the feature map inside the proposal into a fixed-size output (e.g., 6 x 6). Multiple output sizes are pooled and then concatenated as in spatial pyramid pooling.

  • R-CNN -> SPP-Net
    1. R-CNN速度很慢,因为它为每个region proposals执行ConvNet正向传递,而不共享计算。
    2. SPP-Net方法计算整个输入图像的卷积特征映射,然后使用从共享特征映射提取的特征向量对每个region proposals进行分类。

SPPnet also has notable drawbacks. Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a network with log loss, training SVMs, and finally fitting bounding-box regressors. Features are also written to disk. But unlike R-CNN, the fine-tuning algorithm proposed in SPP-Net cannot update the convolutional layers that precede the spatial pyramid pooling. Unsurprisingly, this limitation (fixed convolutional layers) limits the accuracy of very deep networks.

  • SPP-Net -> Fast R-CNN
    1. SPP-Net是一个多阶段的训练过程,且每次都需要写磁盘。
    2. SPP-Net无法更新前面的卷积层。

We propose a new training algorithm that fixes the disadvantages of R-CNN and SPPnet, while improving on their speed and accuracy. We call this method Fast R-CNN because it’s comparatively fast to train and test. The Fast R-CNN method has several advantages:

  1. Higher detection quality (mAP) than R-CNN, SPPnet
  2. Training is single-stage, using a multi-task loss
  3. Training can update all network layers
  4. No disk storage is required for feature caching

Fast R-CNN小结

  1. Fast R-CNN训练是单阶段、使用多任务损失。
  2. 能更新所有网络层,且特征不需要磁盘缓存。

Fig. 1 illustrates the Fast R-CNN architecture. A Fast R-CNN network takes as input an entire image and a set of object proposals. The network first processes the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map. Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map. Each feature vector is fed into a sequence of fully connected (fc) layers that finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes plus a catch-all “background” class and another layer that outputs four real-valued numbers for each of the K object classes. Each set of 4 values encodes refined bounding-box positions for one of the K classes.



  • Fast R-CNN架构
    1. 将整个图像和一组object proposals集合作为输入。
    2. 接着,将整个图像进行卷积、池化以产生一个Conv feature map。
    3. 根据输入的object proposals,使用RoI池化层,直接从Conv feature map中提取RoI feature vector
    4. 最后,RoI feature vector进入全连接层(这时有两个并行的模块):
      • softmax用于分类;
      • regressor用于产生每个类的边界。

First, the last max pooling layer is replaced by a RoI pooling layer that is configured by setting H and W to be compatible with the net’s first fully connected layer (e.g., H = W = 7 for VGG16).

Second, the network’s last fully connected layer and softmax (which were trained for 1000-way ImageNet classification) are replaced with the two sibling layers described earlier (a fully connected layer and softmax over K+1 categories and category-specific bounding-box regressors).

Third, the network is modified to take two data inputs: a list of images and a list of RoIs in those images.

  • R-CNN -> Fast R-CNN
    1. 最后一个最大池层被一个RoI池层代替。
    2. 最后完全连接层和softmax替换为两个并行模块。
    3. 网络被修改为需要两个数据输入:图像列表和这些图像中的RoI列表。

Training all network weights with back-propagation is an important capability of Fast R-CNN. First, let’s elucidate why SPPnet is unable to update weights below the spatial pyramid pooling layer. The root cause is that back-propagation through the SPP layer is highly inefficient when each training sample (i.e. RoI) comes from a different image, which is exactly how R-CNN and SPPnet networks are trained. The inefficiency stems from the fact that each RoI may have a very large receptive field, often spanning the entire input image. Since the forward pass must process the entire receptive field, the training inputs are large (often the entire image).

  • SPP-Net为何无法更新金字塔池层前的卷积层
    1. 根本原因在于当训练样本来自不同图像时,反向传播效率特别低。
    2. 效率低下的原因在于每个RoI可能有一个非常大的接受域,通常跨越整个输入图像。

We propose a more efficient training method that takes advantage of feature sharing during training. In Fast R-CNN training, stochastic gradient descent (SGD) minibatches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image. Critically, RoIs from the same image share computation and memory in the forward and backward passes. Making N small decreases mini-batch computation.

  • Fast R-CNN解决反向传播效率低问题
    1. 特征共享。
    2. 分级采样。

References:
[1] Girshick R. Fast R-CNN[J]. Computer Science, 2015.


@qingdujun
2018-5-25 于 北京 怀柔

文章来源: Fast R-CNN
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!