yolo v3做行人检测+alexnet做上衣颜色识别

参考链接：

前言：

本项目是基于yolo v3已经训练好的coco数据集的模型进行行人检测的，然后将检测出来的结果传入到alexnet训练好的上衣颜色识别模型做识别。alexnet训练上衣颜色的数据集可以采用RAP数据集进行训练，后面会详细说rap数据集的坑。最终结果是：alexnet的val准确率大概在76%左右【这个感觉可以改成VGG16试一下，效果可能会更好】，每一帧做测试的时候需要400多秒.

requirement:

ubuntu14.04【ps:ubuntu 16.04更好，因为支持cuda9.0】

python3.6.1

keras2.1.2

tensorflow-gpu==1.4.1

一.步骤说明：

1）先按照keras版的yolov3的视频代码，框出行人的位置

2）训练上衣颜色识别的模型，原版的【参考链接2】是识别上衣颜色+上衣类别，我需要删掉上衣类别这个网络模型，然后进行训练

3)将第一步得到的行人的位置传入第二步训练好的模型中，然后返回到Yolo v3的视频检测中继续跑

二.对于RAP数据集的坑：

首先RAP数据集的label是mat文件，这就比较棘手了，还好github上有转化为xml格式的代码，直接拿来用

也可以直接用我写的，统计了颜色种类+每种上衣颜色的数量

1）RAP数据集的上衣颜色的种类：

# -*- coding:utf-8 -*- 
__author__ = 'xuy'

# data=scipy.io.loadmat('RAP/RAP_annotation/RAP_annotation.mat')


##this script extract relevant data from .mat file in RAP dataset
import scipy.io
import numpy as np
# import csv
# import datetime
# from datetime import datetime
from datetime import timedelta
# import os
import pandas as pd
shirt_color={}

def loadmat_and_extract(file, root_dir):
    ##load the .mat file
    # mat = scipy.io.loadmat('./RAP_annotation/RAP_annotation.mat') #we have desired objects in mat now
    mat = scipy.io.loadmat(file)
    ##there are key value pairs in mat of which we want wiki key and its values
    # There are totally 7 varibals in RAP_annotation, including imagesname, position, label, partion, attribute_chinese, attribute_eng, attribute_exp.
    print(mat.keys())
    data = mat['RAP_annotation']
    images = data['imagesname']
    labels = data['label']
    eng_attr = data['attribute_eng']
    pos = data['position']

    ## Extracting required labels only
    # 0 -> Gender Pr1
    # 1-3 -> Age
    # 15-23 -> Upper Body
    # 24-29 -> Lower Body
    # 35-42 -> attachments/accessories
    # 51-54 -> face direction
    # 55-62 -> occlusion
    # 63-74 -> upper color
    # 75-82 -> lower color

    ## putting wordy attributes in place of 1's in labels
    req_labels = labels[0][0].astype(str)
    for imgnum in range(0, len(req_labels)):
        for lblnum in range(0, len(req_labels[imgnum])):
            if req_labels[imgnum][lblnum] == '1':
                req_labels[imgnum][lblnum] = eng_attr[0][0][lblnum][0][0]

    # for now taking gender, upper body, lower body, face direction, upper color and lower colr
    # req_labels2 = np.ndarray((41585,1))
    # from set import Set
    req_labels2 = []
    lbl_idx = [0] + list(range(15, 23 + 1)) + list(range(24, 29 + 1)) + list(range(51, 54 + 1)) + list(
        range(63, 74 + 1)) + list(range(75, 82 + 1))
    for imgnum in range(0, len(req_labels)):
        temp_lbl = []
        for i in range(0, 92):
            if i == 0 and req_labels[imgnum][i] == '0':
                temp_lbl.append("Male")
            elif i == 0 and req_labels[imgnum][i] == '2':
                temp_lbl.append("Unknown")
            elif i in lbl_idx:
                temp_lbl.append(req_labels[imgnum][i])

        req_labels2.append(np.asarray(temp_lbl).reshape(-1, 1))

    # req_labels2 = np.asarray(req_labels2)

    img_names = []
    for i in range(0, len(images[0][0])):
        renamed = str(images[0][0][i][0][0][:-4]).replace('-', '_')
        img_names.append(renamed)
    # img_names[0][:-4]

    ##finding size of images
    import cv2
    #    root_dir = "./RAP_dataset/"
    print("extracting images from root dir %s to get image sizes" % root_dir)

    width = []
    height = []
    for l in range(0, len(img_names)):
        #    print(img_names[l])
        file_loc = root_dir + str(img_names[l] + ".png")
        print(file_loc)
        #    print(file_loc)
        img = cv2.imread(file_loc, 0)
        height.append(img.shape[0])
        width.append(img.shape[1])

    ## Finding top right, topleft, bottomright, bottomleft
    ## fb = fullbody, hs = head-shoulder, ub = upperbody, lb = lowerbody

    bbox = list(pos[0][0])
    fb_xmin = []
    fb_ymin = []
    fb_xmax = []
    fb_ymax = []
    hs_xmin = []
    hs_ymin = []
    hs_xmax = []
    hs_ymax = []
    ub_xmin = []
    ub_ymin = []
    ub_xmax = []
    ub_ymax = []
    lb_xmin = []
    lb_ymin = []
    lb_xmax = []
    lb_ymax = []
    for i in range(0, len(bbox)):
        fb_xmin.append(bbox[i][0])
        fb_ymin.append(bbox[i][1])
        fb_xmax.append(bbox[i][2] + bbox[i][0])
        fb_ymax.append(bbox[i][3] + bbox[i][1])
        hs_xmin.append(bbox[i][4])
        hs_ymin.append(bbox[i][5])
        hs_xmax.append(bbox[i][6] + bbox[i][4])
        hs_ymax.append(bbox[i][7] + bbox[i][5])
        ub_xmin.append(bbox[i][8])
        ub_ymin.append(bbox[i][9])
        ub_xmax.append(bbox[i][10] + bbox[i][8])
        ub_ymax.append(bbox[i][11] + bbox[i][9])
        lb_xmin.append(bbox[i][12])
        lb_ymin.append(bbox[i][13])
        lb_xmax.append(bbox[i][14] + bbox[i][12])
        lb_ymax.append(bbox[i][15] + bbox[i][13])

    ## Saving attribute list
    attr = []
    for i in lbl_idx:
        attr.append(eng_attr[0][0][i][0][0])

    data3 = {'labels': attr}
    df3 = pd.DataFrame(data=data3, index=lbl_idx)
    df3.to_csv("attributes.csv")

    ## Putting all data in dataframe
    data2 = {'images': img_names, 'labels': req_labels2, 'width': width, 'height': height,
             'fb_xmin': fb_xmin, 'fb_xmax': fb_xmax, 'fb_ymin': fb_ymin, 'fb_ymax': fb_ymax,
             'ub_xmin': ub_xmin, 'ub_xmax': ub_xmax, 'ub_ymin': ub_ymin, 'ub_ymax': ub_ymax,
             'hs_xmin': hs_xmin, 'hs_xmax': hs_xmax, 'hs_ymin': hs_ymin, 'hs_ymax': hs_ymax,
             'lb_xmin': lb_xmin, 'lb_xmax': lb_xmax, 'lb_ymin': lb_ymin, 'lb_ymax': lb_ymax}
    df = pd.DataFrame(data=data2)
    return df


def annotate(df):
    # df = pd.read_csv(csvfile)
    for row in df.itertuples():#将mat文件转化为xml文件
        xmlData = open("annotations/" + str(row.images) + ".xml", 'w')
        xmlData.write('<?xml version="1.0"?>' + "\n")
        xmlData.write('<annotation>' + "\n")
        xmlData.write('    ' + '<folder>RAP_dataset/</folder>' + "\n")
        xmlData.write('    ' + '<filename>' \
                      + str(str(row.images) + '.png') + '</filename>' + "\n")
        xmlData.write('    ' + '<size>' + "\n")
        xmlData.write('        ' + '<width>' \
                      + str(row.width) + '</width>' + "\n")
        xmlData.write('        ' + '<height>' \
                      + str(row.height) + '</height>' + "\n")
        xmlData.write('        ' + '<depth>3</depth>' + "\n")
        xmlData.write('    ' + '</size>' + "\n")

        for i in range(0, len(row.labels)):
            #            if row.labels[i] != "0" or row.labels[i] == "['2']":
            #            if row.labels[i] != "0" or row.labels[i] != "2":
            ext_lbl = str(row.labels[i]).replace("[", "").replace("]", "").replace("'", "")
            if ext_lbl != "0" or ext_lbl == "2":
                xmlData.write('    ' + '<object>' + "\n")
                xmlData.write('        ' + '<name>' \
                              + str(ext_lbl) + '</name>' + "\n")

                xmlData.write('        ' + '<pose>Unknown</pose>' + "\n")
                xmlData.write('        ' + '<truncated>0</truncated>' + "\n")
                xmlData.write('        ' + '<difficult>0</difficult>' + "\n")
                if row.labels[i][0][:2] == 'Ma' or row.labels[i][0][:2] == 'Fe':
                    xmlData.write('        ' + '<bndbox>' + "\n")
                    xmlData.write('            ' + '<xmin>' \
                                  + str(row.fb_xmin) + '</xmin>' + "\n")
                    xmlData.write('            ' + '<ymin>' \
                                  + str(row.fb_ymin) + '</ymin>' + "\n")
                    xmlData.write('            ' + '<xmax>' \
                                  + str(row.fb_xmax) + '</xmax>' + "\n")
                    xmlData.write('            ' + '<ymax>' \
                                  + str(row.fb_ymax) + '</ymax>' + "\n")
                    xmlData.write('        ' + '</bndbox>' + "\n")
                if row.labels[i][0][:2] == 'up' or row.labels[i][0][:2] == 'ub':
                    if row.labels[i][0][:2] == 'up':
                        color=row.labels[i][0][3:]
                        shirt_color[color]=shirt_color.get(color,0)+1



                    xmlData.write('        ' + '<bndbox>' + "\n")
                    xmlData.write('            ' + '<xmin>' \
                                  + str(row.ub_xmin) + '</xmin>' + "\n")
                    xmlData.write('            ' + '<ymin>' \
                                  + str(row.ub_ymin) + '</ymin>' + "\n")
                    xmlData.write('            ' + '<xmax>' \
                                  + str(row.ub_xmax) + '</xmax>' + "\n")
                    xmlData.write('            ' + '<ymax>' \
                                  + str(row.ub_ymax) + '</ymax>' + "\n")

                    xmlData.write('        ' + '</bndbox>' + "\n")
                if row.labels[i][0][:3] == 'low' or row.labels[i][0][:2] == 'lb':
                    xmlData.write('        ' + '<bndbox>' + "\n")
                    xmlData.write('            ' + '<xmin>' \
                                  + str(row.lb_xmin) + '</xmin>' + "\n")
                    xmlData.write('            ' + '<ymin>' \
                                  + str(row.lb_ymin) + '</ymin>' + "\n")
                    xmlData.write('            ' + '<xmax>' \
                                  + str(row.lb_xmax) + '</xmax>' + "\n")
                    xmlData.write('            ' + '<ymax>' \
                                  + str(row.lb_ymax) + '</ymax>' + "\n")

                    xmlData.write('        ' + '</bndbox>' + "\n")
                if row.labels[i][0][:2] == 'fa':
                    xmlData.write('        ' + '<bndbox>' + "\n")
                    xmlData.write('            ' + '<xmin>' \
                                  + str(row.hs_xmin) + '</xmin>' + "\n")
                    xmlData.write('            ' + '<ymin>' \
                                  + str(row.hs_ymin) + '</ymin>' + "\n")
                    xmlData.write('            ' + '<xmax>' \
                                  + str(row.hs_xmax) + '</xmax>' + "\n")
                    xmlData.write('            ' + '<ymax>' \
                                  + str(row.hs_ymax) + '</ymax>' + "\n")

                    xmlData.write('        ' + '</bndbox>' + "\n")
                xmlData.write('    ' + '</object>' + "\n")
        xmlData.write('</annotation>' + "\n")
        xmlData.close()


file = 'RAP/RAP_annotation/RAP_annotation.mat'
root_dir = 'RAP/RAP_dataset/'
RAP_anno = loadmat_and_extract(file, root_dir)
# RAP_anno.to_csv('RAP_attributes.csv')
annotate(RAP_anno)

for word in shirt_color:
    print('{} {}'.format(word,(shirt_color[word])))
    # CAM21_2014_02_26_20140226111426_20140226112822_tarid136_frame1728_line1.xml

我们发现一共有12种颜色：

White 7837
Red 4489
Black 21680
Mixture 7006
Green 1951
Gray 9311
Brown 1197
Yellow 1762
Blue 5713
Pink 1104
Purple 718
Orange 485

RAP数据集的上衣颜色并不是以数字命名的，而是up_[color]来进行命名，需要截断up后面的内容，还得注意异常的判断，这个也是其他项目当中需要注意的问题，毕竟并不是所有的数据集都有上衣属性，读了好几个返回None的文件，debug半天才发现

2）上衣的坐标位置

你会发现上衣的坐标位置很奇怪，图片就那么大，但是上衣的位置超过了图片的像素位置，因为上衣位置是相对于male或者female这个属性位置来的，需要做减法

eg:

=============================

此时的xmin在图片中的位置就是646-639+1。以此类推

3）RAP数据集的用途：

问了一下作者，这个数据集仅仅用来图片分类用，不能做行人检测【因为不是全景图】，所以文件名给的frame帧号是没有用的

三.一些经验教训：

1)在调用函数的时候一定要进行异常判断，防止为空的情况，毕竟数据集很大，有一些特殊情况，这些一定要处理，否则debug半天找不出来问题出现在哪里

2）PIL以及opencv的问题

python的PIL和cv2都是做图像处理的库，但是二者有一个明显的区别。所以需要转换

PIL的三通道的编码方式是：RGB

opencv的三通道的编码方式是：BGR

如果做项目拼接的时候遇到此类问题的话，注意转换

3）PIL库的roi以及opencv的roi的区别：

PIL库截取ROI用的是：

roi=img.crop((left,top,right,bottom))#注意：这里是两个括号

opencv使用roi:

roi=img[top:bottom,left:right]

来源：CSDN

作者：mdjxy63

链接：https://blog.csdn.net/mdjxy63/article/details/81563824

标签

yolo

bbox