SSD实战
1. 先从跑别人的代码开始
https://github.com/amdegroot/ssd.pytorch
下载代码,markdown文件有教程,但是代码有一些问题
问题汇总
用的是VOC2012训练集
删除voc0712.py __init__函数中imageset参数里07的信息
运行Train.py
RuntimeError: CUDA out of memory. Tried to allocate 176.00 MiB (GPU 0; 2.00 GiB total capacity; 1.23 GiB already allocated; 107.80 MiB free; 1.24 GiB reserved in total by PyTorch)
电脑太差,降低batch_size
IndexError: The shape of the mask [8, 8732] at index 0 does not match the shape of the indexed tensor [69856, 1] at index 0
ssd.pytorch\layers\modules\multibox_loss.py中loss 维度不对应
layers/modules/multibox_loss.py 第97、98行调换
loss_c = loss_c.view(num, -1)loss_c[pos] = 0 # filter out pos boxes for now
第114行将 N = num_pos.data.sum() 改为
loc_loss += loss_l.data.item()conf_loss += loss_c.data.item()
训练出现loss为nan
降低学习率
StopIteration
使用next()到尽头后会报错,重置迭代器即可
try: images, targets = next(batch_iterator) except StopIteration as e: batch_iterator = iter(data_loader) images, targets = next(batch_iterator)
数据预处理
SSD在数据增强上做了非常丰富的处理,从而提高了小物体和遮挡物体的检测效果。它的流程可以分为光学变换和几何变换,光学变换不会改变图片的大小,几何变换主要进行尺寸上的变换,最后再进行取均值操作,大部分操作都是随机的。
数据增强的流程代码在augmentations.py中
class SSDAugmentation(object): def __init__(self, size=300, mean=(104, 117, 123)): self.mean = mean self.size = size self.augment = Compose([ ConvertFromInts(), 将像素值由整数变为浮点数 ToAbsoluteCoords(), 将标签中的边框的比例坐标变为绝对坐标 PhotometricDistort(), 亮度、对比度、饱和度的随机变换,随机调换通道 Expand(self.mean), 随机扩展图像大小,图像考右下方 RandomSampleCrop(), 随机裁剪图像 RandomMirror(), 随机左右镜像 ToPercentCoords(), 从真实坐标变回比例坐标 Resize(self.size), 缩放到300x300的固定大小 SubtractMeans(self.mean) 取均值 ]) def __call__(self, img, boxes, labels): return self.augment(img, boxes, labels)
光学变换中的亮度调整
class RandomBrightness(object): def __init__(self, delta=32): assert delta >= 0.0 assert delta <= 255.0 self.delta = delta def __call__(self, image, boxes=None, labels=None): if random.randint(2): delta = random.uniform(-self.delta, self.delta) image += delta return image, boxes, labels
class RandomBrightness(object): def __init__(self, delta=32): assert delta >= 0.0 assert delta <= 255.0 self.delta = delta def __call__(self, image, boxes=None, labels=None): if random.randint(2): delta = random.uniform(-self.delta, self.delta) image += delta return image, boxes, labels
以0.5的概率为图像中的每个元素加一个位于[-32,32)区间内的数
class RandomLightingNoise(object): def __init__(self): self.perms = ((0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)) def __call__(self, image, boxes=None, labels=None): if random.randint(2): swap = self.perms[random.randint(len(self.perms))] shuffle = SwapChannels(swap) # shuffle channels image = shuffle(image) return image, boxes, labelsclass Expand(object): def __init__(self, mean): self.mean = mean def __call__(self, image, boxes, labels): if random.randint(2): return image, boxes, labels height, width, depth = image.shape ratio = random.uniform(1, 4) left = random.uniform(0, width*ratio - width) top = random.uniform(0, height*ratio - height) expand_image = np.zeros( (int(height*ratio), int(width*ratio), depth), dtype=image.dtype) expand_image[:, :, :] = self.mean expand_image[int(top):int(top + height), int(left):int(left + width)] = image image = expand_image boxes = boxes.copy() boxes[:, :2] += (int(left), int(top)) boxes[:, 2:] += (int(left), int(top)) return image, boxes, labels来源:CSDN
作者:weixin_43874764
链接:https://blog.csdn.net/weixin_43874764/article/details/104309313
来源:CSDN
作者:weixin_43874764
链接:https://blog.csdn.net/weixin_43874764/article/details/104309313
CSDN
weixin_43874764
https://blog.csdn.net/weixin_43874764/article/details/104309313