简介
在前面已经介绍了很多目标检测算法,特别是R-CNN为代表两阶段算法系列。但是两阶段算法的速度太慢是一个不容忽视的问题,对于实时性要求很高的场景非常致命。而YOLO——you only look one的诞生就是为了解决识别速度的问题,它直接在输出层回归bounding box的位置和bounding box所属类别,从而实现one-stage。
原论文地址:https://arxiv.org/pdf/1804.02767.pdf
YOLOv3的前世今生
- YOLOv1 (2015,CVPR):
主干网络结构借鉴了GoogleNet,直接将输入图片划分成7×7=49个网格,每个网格预测两个边界框。由于网格设置比较稀疏,所以总体预测精度不高,且对小物体的检测效果较差。
缺陷:输入图像必须是固定尺寸,因为输出层是全连接层;每个网格只输出一个类别;在损失函数中,对小物体变化的置信度和大物体变化的置信度惩罚相同,实际小物体变化应该加大惩罚。 - YOLOv2 (2016):
主干网络采用DarkNet-19;取消了Dropout,全采用BN层;去掉全连接层,通过Anchor boxes回归位置,k-means聚类来确定anchor的尺寸;多尺度训练,因为网络为全卷积层,所以同样的网络可以预测不同分辨率的输入。 - YOLOv3 (2018):
主干网络DrakNet-53,引入了残差块;引入了多尺度预测(FPN);预测对象类别时不使用softmax,改成使用logistic的输出进行预测。
一、YOLOv3网络结构
由图可以看出:输入416×416×3的图像,通过darknet网络得到三种不同尺度的预测结果,每个尺度都对应N个通道,包含着预测的信息;
yolov1,有7×7×2个预测;
yolov2,有13×13×5个预测;
yolov3共有13×13×3 + 26×26×3 + 52×52×3个预测 。
每个pre的组成分别是 4(坐标xyxy) + 1(置信度conf) + 80(类别cls) = 85。
二、YOLOv3的训练
- 使用K-means算法决定自己的anchors的大小:
# -*- coding=utf-8 -*-
import glob
import os
import sys
import numpy as np
# 根文件夹
ROOT_PATH = './data/'
# 聚类的数目
CLUSTERS = 9
# 模型中图像的输入尺寸,默认是一样的
SIZE = 416
# 加载YOLO格式的标注数据
def load_dataset(path):
jpegimages = os.path.join(path, 'images')
if not os.path.exists(jpegimages):
print('no JPEGImages folders, program abort')
sys.exit(0)
labels_txt = os.path.join(path, 'labels')
if not os.path.exists(labels_txt):
print('no labels folders, program abort')
sys.exit(0)
label_file = os.listdir(labels_txt)
print('label count: {}'.format(len(label_file)))
dataset = []
for label in label_file:
with open(os.path.join(labels_txt, label), 'r') as f:
txt_content = f.readlines()
for line in txt_content:
line_split = line.split(' ')
roi_with = float(line_split[len(line_split)-2])
roi_height = float(line_split[len(line_split)-1])
if roi_with == 0 or roi_height == 0:
continue
dataset.append([roi_with, roi_height])
# print([roi_with, roi_height])
return np.array(dataset)
def iou(box, clusters):
x = np.minimum(clusters[:, 0], box[0])
y = np.minimum(clusters[:, 1], box[1])
if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
raise ValueError("Box has no area")
intersection = x * y
box_area = box[0] * box[1]
cluster_area = clusters[:, 0] * clusters[:, 1]
iou_ = intersection / (box_area + cluster_area - intersection)
return iou_
def avg_iou(boxes, clusters):
return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])
def translate_boxes(boxes):
new_boxes = boxes.copy()
for row in range(new_boxes.shape[0]):
new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
return np.delete(new_boxes, [0, 1], axis=1)
def kmeans(boxes, k, dist=np.median):
rows = boxes.shape[0]
distances = np.empty((rows, k))
last_clusters = np.zeros((rows,))
np.random.seed()
# the Forgy method will fail if the whole array contains the same rows
clusters = boxes[np.random.choice(rows, k, replace=False)]
while True:
for row in range(rows):
distances[row] = 1 - iou(boxes[row], clusters)
nearest_clusters = np.argmin(distances, axis=1)
if (last_clusters == nearest_clusters).all():
break
for cluster in range(k):
clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
last_clusters = nearest_clusters
return clusters
data = load_dataset(ROOT_PATH)
out = kmeans(data, k=CLUSTERS)
print(out)
print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))
print("Boxes:\n {}-{}".format(out[:, 0] * SIZE, out[:, 1] * SIZE))
ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()
print("Ratios:\n {}".format(sorted(ratios)))
- 损失函数的计算:
YOLOv3的Loss分为三个部分:
1、xywh部分带来的误差,也就是bbox带来的loss
2、置信度带来的误差,也就是obj带来的loss
2、类别带来的误差,也就是class带来的loss
在代码中分别对应lbox, lobj, lcls,论文的公式如下:
而在github上pytorch版本的代码中,有一些改动:
- lbox部分:
yolov3中提供了IoU、GIoU、DIoU和CIoU等计算方式,我们使用GIOU:
if GIoU: # Generalized IoU https://arxiv.org/pdf/1902.09630.pdf
c_area = cw * ch + 1e-16 # convex area
return iou - (c_area - union) / c_area # GIoU
giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True) # giou computation
lbox += (1.0 - giou).sum() if red == 'sum' else (1.0 - giou).mean() # giou loss
- lobj部分:
lobj代表置信度,即该bounding box中是否含有物体的概率。
BCEobj = nn.BCEWithLogitsLoss(pos_weight=ft([h['obj_pw']]), reduction=red)
lobj += BCEobj(pi[..., 4], tobj) # obj loss
- lcls部分:
BCEcls = nn.BCEWithLogitsLoss(pos_weight=ft([h['cls_pw']]), reduction=red)
if model.nc > 1: # cls loss (only if multiple classes)
t = torch.full_like(ps[:, 5:], cn) # targets
t[range(nb), tcls[i]] = cp
lcls += BCEcls(ps[:, 5:], t) # BCE
有个大佬画的图一目了然:
torch.nn.BCELoss的功能是二分类任务是的交叉熵计算函数,可以认为是CrossEntropy的特例。
torch.nn.BCEWithLogitsLoss 相当于Sigmoid+BCELoss, 即input会经过Sigmoid激活函数,将input变为概率分布的形式。
三、YOLOv3的预测
- NMS算法改进:
根据应用场景的不同,预测框会出现不同的误差,看情况改进non_max_suppression函数:
def non_max_suppression(prediction, conf_thres=0.3, iou_thres=0.5, multi_label=True, classes=None, agnostic=False):
"""
Performs Non-Maximum Suppression on inference results
Returns detections with shape:
nx6 (x1, y1, x2, y2, conf, cls)
"""
# Box constraints
min_wh, max_wh = 10, 100 # (pixels) minimum and maximum box width and height
method = 'merge'
nc = prediction[0].shape[1] - 5 # number of classes
multi_label &= nc > 1 # multiple labels per box
output = [None] * len(prediction)
for xi, x in enumerate(prediction): # image index, image inference
# Apply conf constraint
x = x[x[:, 4] > conf_thres]
# Apply width-height constraint
x = x[((x[:, 2:4] > min_wh) & (x[:, 2:4] < max_wh)).all(1)]
areas=((x[:, 2:3]*x[:, 3:4]>225)&(x[:, 2:3]*x[:, 3:4]<10000)).all(1)
x=x[areas]
r = ((x[:, 2:3]/x[:, 3:4] > 0.4) & (x[:, 2:3]/x[:, 3:4] < 2.5)).all(1) # remove narrow box
if len(r) !=0:
x = x[r]
# If none remain process next image
if not x.shape[0]:
continue
# Compute conf
x[..., 5:] *= x[..., 4:5] # conf = obj_conf * cls_conf
x = x[x[:, 5] > 0.20]
cc=((x[:, 5:] *x[:,2:3]*x[:,3:4] > 360)&(x[:, 5:] *x[:,2:3]*x[:,3:4] < 10000)).all(1)
x = x[cc]
if not x.shape[0]:
continue
# Box (center x, center y, width, height) to (x1, y1, x2, y2)
box = xywh2xyxy(x[:, :4])
# Detections matrix nx6 (xyxy, conf, cls)
if multi_label:
i, j = (x[:, 5:] > conf_thres).nonzero().t()
x = torch.cat((box[i], x[i, j + 5].unsqueeze(1), j.float().unsqueeze(1)), 1)
else: # best class only
conf, j = x[:, 5:].max(1)
x = torch.cat((box, conf.unsqueeze(1), j.float().unsqueeze(1)), 1)
# Filter by class
if classes:
x = x[(j.view(-1, 1) == torch.tensor(classes, device=j.device)).any(1)]
# Apply finite constraint
if not torch.isfinite(x).all():
x = x[torch.isfinite(x).all(1)]
# If none remain process next image
n = x.shape[0] # number of boxes
if not n:
continue
# Sort by confidence
# if method == 'fast_batch':
# x = x[x[:, 4].argsort(descending=True)]
# Batched NMS
c = x[:, 5] * 0 if agnostic else x[:, 5] # classes
boxes, scores = x[:, :4].clone() + c.view(-1, 1) * max_wh, x[:, 4] # boxes (offset by class), scores
if method == 'merge': # Merge NMS (boxes merged using weighted mean)
i = torchvision.ops.boxes.nms(boxes, scores, iou_thres)
if 1 < n < 3E3: # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
try:
# weights = (box_iou(boxes, boxes).tril_() > iou_thres) * scores.view(-1, 1) # box weights
# weights /= weights.sum(0) # normalize
# x[:, :4] = torch.mm(weights.T, x[:, :4])
weights = (box_iou(boxes[i], boxes) > iou_thres) * scores[None] # box weights
x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes
except: # possible CUDA error https://github.com/ultralytics/yolov3/issues/1139
pass
elif method == 'vision':
i = torchvision.ops.boxes.nms(boxes, scores, iou_thres)
elif method == 'fast': # FastNMS from https://github.com/dbolya/yolact
iou = box_iou(boxes, boxes).triu_(diagonal=1) # upper triangular iou matrix
i = iou.max(0)[0] < iou_thres
output[xi] = x[i]
return output
来源:oschina
链接:https://my.oschina.net/u/4309973/blog/4391343