python讀取和寫入caffe-ssd中lmdb data

原文：https://zhuanlan.zhihu.com/p/76318150

lmdb是Cafffe中应用的一种数据库，采用内存-映射文件（memory-mapped files），拥有非常好的I/O性能，而AnnotatedDatum是caffe在LMDB数据库上保存数据的一种格式，主要用于SSD等目标检测模型训练数据的保存。

在SSD-caffe的官方代码中（好像不算官方，是第三方实现的），模型训练是通过自定义层读取读取LMDB数据库文件进行数据导入的，而指定格式的LMDB数据则是通过两个脚本调用C++库生成，查阅代码后发现对应C++库只能读取图像文件，而笔者的训练数据格式特殊，要想生成对应的LMDB文件就必须改caffe的代码并编译，或者自己写一个工具来生成LMDB文件，毫无疑问后者更简单，因此本文主要内容是如何在Python下生成正确格式的LMDB数据以用于训练，以下正文。

首先我们按照官方的脚本生成了正确格式的lmdb文件，然后通过代码读取进来查看它的结构

import lmdb
import numpy as np
import cv2
from caffe.proto import caffe_pb2

lmdb_env = lmdb.open('VOC0712_test_lmdb')
lmdb_txn = lmdb_env.begin()  # 生成处理句柄
lmdb_cursor = lmdb_txn.cursor()  # 生成迭代器指针
annotated_datum = caffe_pb2.AnnotatedDatum()  # AnnotatedDatum结构

for key, value in lmdb_cursor:

    annotated_datum.ParseFromString(value) #按照AnnotatedDatum的结构解析lmdb中的数据
    datum = annotated_datum.datum  # Datum结构
    grps = annotated_datum.annotation_group  # AnnotationGroup结构
    type = annotated_datum.type
    # 一个grp表示一个lebel类，每个grp下又有复数个annotation表示检测框box
    for grp in grps:
        label = grp.group_label
        for annotation in grp.annotation:
            instance_id = annotation.instance_id
            xmin = annotation.bbox.xmin * datum.width  # Annotation结构
            ymin = annotation.bbox.ymin * datum.height
            xmax = annotation.bbox.xmax * datum.width
            ymax = annotation.bbox.ymax * datum.height

    # Datum结构label以及三个维度
    _ = datum.label # 在目标检测的数据集中，这个label是没有意义的，真正的label是上面的group_label
    channels = datum.channels
    height = datum.height
    width = datum.width
    encoded = datum.encoded # 如果这个参数为true，则表示保存的直接代码是已经经过图像编码的
    if encoded: # 经过图像编码的数据可以直接利用cv2解码
        image = np.fromstring(datum.data, dtype=np.uint8)  # 字符串转换为矩阵
        image = cv2.imdecode(image, -1)  # decode
    else: # 否则则需要转为矩阵后重新reshape，datum_to_array()函数包含了这些操作
        image = caffe.io.datum_to_array(datum)
    cv2.imshow("image", image)  # 显示图片
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

AnnotatedDatum就是caffe用来保存数据对象的一种数据结构了，AnnotatedDatum中又包含一个Datum类型数据（用于保存图像内容）和一个基于google的ProtoBuf（一种接口描述语言）的自定义类型数据annotation_group（用于保存检测框），此外AnnotatedDatum自身也是Datum类型数据，Datum类似于Dict，是lmdb数据库中常用的保存形式，关于Datum的详细说明见祁昆仑的这兩篇文章：

https://zhuanlan.zhihu.com/p/23485774

https://blog.csdn.net/xunan003/article/details/107038675

经过上面的代码，我们可以知道Caffe中AnnotatedDatum的数据结构样例如下：

annodateddatum {
    {
      channels: 3
      height: 500
      width: 353
      data: ""
      label: -1
      encoded: true
    }
    type: BBOX
    annotation_group {
        group_label: 12
        annotation {
          instance_id: 0
          bbox {
            xmin: 0.1359773427248001
            ymin: 0.47999998927116394
            xmax: 0.5524079203605652
            ymax: 0.7419999837875366
            difficult: false
          }
        }
    }
    annotation_group {
      group_label: 15
      annotation {
        instance_id: 0
        bbox {
          xmin: 0.02266288921236992
          ymin: 0.024000000208616257
          xmax: 0.9971671104431152
          ymax: 0.9959999918937683
          difficult: false
        }
      }
    }
}

其中datum中的data键值即为图像数组encode后的字节数据，因为很长这里省略。

不难发现，AnnotatedDatum中包含一个用于保存图像数据的Datum类，一个键值对type，和数个保存gt（ground true，下同）检测框的annotation_group类，其中annotation_group类便是ProtoBuf生成的自定义类，关于ProtoBuf的使用可以参考这篇文章：

https://www.jianshu.com/p/b9fcecb96718?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=qzone

在知道了AnnotatedDatum的内部结构后，我们就可以开始着手生成caffe可以正常读取的lmdb文件了：

import numpy as np
import lmdb
import caffe
N = 1  # 假设现在有N张图片

image = np.ones((N, 6, 512, 512), dtype=np.uint8)  # 生成一张全白的图片数据
target = np.random.randn(N, 3, 5)  # 随机生成label，这里表示N张图片，每张图片有3个box数据，每个box数据包含5个值（两个点的xy加上分类值）

env = lmdb.open('mylmdb', map_size=1e12)  # 创建一个lmdb文件，mapsize表示这个文件的大小

with env.begin(write=True) as txn:  # 调出指针，开始写数据
    # txn is a Transaction object
    for i in range(N):
        str_id = str(i)  # key
        annotated_datum = caffe.proto.caffe_pb2.AnnotatedDatum()  # 声明一个AnnotatedDatum类对象
        annotated_datum.type = annotated_datum.BBOX
        for box in target[i]:  # 开始根据target构造box数据
            annotation_group = annotated_datum.annotation_group.add()  # 声明一个新的annotation_group结构
            (xmin, ymin, xmax, ymax, label) = box
            annotation_group.group_label = int(label*10)  # 传入label值
            annotation = annotation_group.annotation.add()  # 声明一个annotation结构用来保存box数据
            annotation.instance_id = 0 #这个值表示这是当前图片下当前分类的第几个box,这里就先默认是第一个
            annotation.bbox.xmin = xmin
            annotation.bbox.ymin = ymin
            annotation.bbox.xmax = xmax
            annotation.bbox.ymax = ymax
            annotation.bbox.difficult=False #表示是否是难识别对象，这里就默认不是好了

        datum = annotated_datum.datum #声明一个datum结构用于保存图像信息
        datum.channels = image.shape[1]
        datum.height = image.shape[2]
        datum.width = image.shape[3]
        datum.data = cv2.imencode('.jpg',image[i])[1].tobytes() #将图像的array数组转为字节数据
        datum.encoded = True
        datum.label = -1 #由于我们的label数据定义在annotation_group中了，所以这里默认为-1

        # The encode is only essential in Python 3
        txn.put(str_id.encode('ascii'), annotated_datum.SerializeToString()) #保存annotated_datum到lmdb文件中

上面可能会有读者无法完全读懂，没有关系，能看懂大意就行，这些声明的类和值导入规则都是根据caffe中ProtoBuf的相关配置文件的定义来的，如果读者有兴趣可以尝试深入研究CAFFE_HOME/python/caffe/proto/caffe_pb2.py文件，里面有详细的定义。

经过上文代码生成的lmdb文件就可以直接被caffe中的AnnotatedData层接受了

来源：oschina

链接：https://my.oschina.net/u/4407314/blog/4331335

标签

Protocol Buffers