PyTorch 目标检测（五）

SSD实战

1. 先从跑别人的代码开始

https://github.com/amdegroot/ssd.pytorch
下载代码，markdown文件有教程，但是代码有一些问题

问题汇总

用的是VOC2012训练集

删除voc0712.py __init__函数中imageset参数里07的信息

运行Train.py

RuntimeError: CUDA out of memory. Tried to allocate 176.00 MiB (GPU 0; 2.00 GiB total capacity; 1.23 GiB already allocated; 107.80 MiB free; 1.24 GiB reserved in total by PyTorch)

电脑太差，降低batch_size

IndexError: The shape of the mask [8, 8732] at index 0 does not match the shape of the indexed tensor [69856, 1] at index 0

ssd.pytorch\layers\modules\multibox_loss.py中loss 维度不对应

layers/modules/multibox_loss.py 第97、98行调换

loss_c = loss_c.view(num, -1)loss_c[pos] = 0  # filter out pos boxes for now

第114行将 N = num_pos.data.sum() 改为

loc_loss += loss_l.data.item()conf_loss += loss_c.data.item()

训练出现loss为nan

降低学习率

StopIteration

使用next()到尽头后会报错，重置迭代器即可

        try:            images, targets = next(batch_iterator)        except StopIteration as e:            batch_iterator = iter(data_loader)            images, targets = next(batch_iterator)

数据预处理

SSD在数据增强上做了非常丰富的处理，从而提高了小物体和遮挡物体的检测效果。它的流程可以分为光学变换和几何变换，光学变换不会改变图片的大小，几何变换主要进行尺寸上的变换，最后再进行取均值操作，大部分操作都是随机的。
在这里插入图片描述
数据增强的流程代码在augmentations.py中

class SSDAugmentation(object):    def __init__(self, size=300, mean=(104, 117, 123)):        self.mean = mean        self.size = size        self.augment = Compose([            ConvertFromInts(), 将像素值由整数变为浮点数            ToAbsoluteCoords(), 将标签中的边框的比例坐标变为绝对坐标            PhotometricDistort(), 亮度、对比度、饱和度的随机变换，随机调换通道            Expand(self.mean), 随机扩展图像大小，图像考右下方            RandomSampleCrop(), 随机裁剪图像            RandomMirror(), 随机左右镜像            ToPercentCoords(), 从真实坐标变回比例坐标            Resize(self.size), 缩放到300x300的固定大小            SubtractMeans(self.mean) 取均值        ])    def __call__(self, img, boxes, labels):        return self.augment(img, boxes, labels)

光学变换中的亮度调整

class RandomBrightness(object): def __init__(self, delta=32):     assert delta >= 0.0     assert delta <= 255.0     self.delta = delta def __call__(self, image, boxes=None, labels=None):     if random.randint(2):         delta = random.uniform(-self.delta, self.delta)         image += delta     return image, boxes, labels

以0.5的概率为图像中的每个元素加一个位于[-32,32)区间内的数

class RandomLightingNoise(object): def init(self): self.perms = ((0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)) def call(self, image, boxes=None, labels=None): if random.randint(2): swap = self.perms[random.randint(len(self.perms))] shuffle = SwapChannels(swap) # shuffle channels image = shuffle(image) return image, boxes, labels
class Expand(object): def init(self, mean): self.mean = mean def call(self, image, boxes, labels): if random.randint(2): return image, boxes, labels height, width, depth = image.shape ratio = random.uniform(1, 4) left = random.uniform(0, widthratio - width) top = random.uniform(0, heightratio - height) expand_image = np.zeros( (int(heightratio), int(widthratio), depth), dtype=image.dtype) expand_image[:, :, :] = self.mean expand_image[int(top):int(top + height), int(left):int(left + width)] = image image = expand_image boxes = boxes.copy() boxes[:, :2] += (int(left), int(top)) boxes[:, 2:] += (int(left), int(top)) return image, boxes, labels
来源：`CSDN`
作者：`weixin_43874764`
链接：`https://blog.csdn.net/weixin_43874764/article/details/104309313`

标签

达美航空