经典网络复现系列（二）：SegNet

1、论文简要
和FCN结构相似，只不过编码器使用了VGG16的13个卷积层，在池化过程中，保存了最大池化的索引。上采样到恢复到原本的位置，其他位置的元素为0，然后进行反卷积。

这样做的好处在于
1)改善边界描述
2)减少end2end的训练参数(与FCN相比节约内存)
3)这样的形式可用于多种encoder-decoder架构

有工作将RNN、条件随机场(CRF)引入配合decoder做预测，有助于提高边界描绘能力，并且指出了，CRF-RNN这一套可以附加到包括SegNet在内的任何深度分割模型。

现有的多尺度的深度神经网络架构的应用，常见两种形式：

将输入放缩为多个尺度得到相应的feature map
将一张图送到模型，得到不同层的feature map
这些方法的共同想法都是使用多尺度信息将高层的feature map包含的语义信息与底层的feature map包含的精度信息融合到一起。但是，这样方法参数多，比较难训练。
参考博客https://blog.csdn.net/u011974639/article/details/78916327

2、网络结构

SegNet的正式版有13个卷积层，5个池化层，对应有13个反卷积层，5个上采样层。
我复现了一个basic版，共4个卷积层，4个池化层，对应4个反卷积层和4个上采样层。

[7,7,3,64]

[7,7,64,64]

conv3(conv+bias+batchnormal+relu) [7,7,64,64]

conv4(conv+bias+batchnormal+relu) [7,7,64,64]

[7,7,64,64]

[7,7,64,64]

[7,7,64,64]

[1,1,64,NUM_CLASSES]

可以看到
1)batchnormalization用在conv+bias以后

3)原论文中在计算损失时对不同类别造成的损失乘以不同的权重，以此来实现类平衡，tensorflow中没有找到相关实现，因此我直接求了交叉熵，没有考虑类平衡的问题。

3、复现中的小trick

1)tensorflow 读取数据的方式
参考博客
https://blog.csdn.net/lujiandong1/article/details/53376802

Tensorflow数据读取有三种方式：

Preloaded data: 预加载数据
Feeding: Python产生数据，再把数据喂给后端。
Reading from file: 从文件中直接读取

这两种方案的缺点：
1、预加载：将数据直接内嵌到Graph中，再把Graph传入Session中运行。当数据量比较大时，Graph的传输会遇到效率问题。

2、用占位符替代数据，待运行的时候填充数据。

前两种方法很方便，但是遇到大型数据的时候就会很吃力，即使是Feeding，中间环节的增加也是不小的开销，比如数据类型转换等等。最优的方案就是在Graph定义好文件读取的方法，让TF自己去从文件中读取数据，并解码成可使用的样本集。

这次我们使用的是第三种方法，即直接从文件中读取数据。

代码示例：

#imgs_dir labels_dir是数据路径列表和标签路径列表    #首先将两个列表转化为tensor imgs_tensor=ops.convert_to_tensor(imgs_dir,dtype=tf.string) labels_tensor=ops.convert_to_tensor(labels_dir,dtype=tf.string)    #建立队列 filename_queue=tf.train.slice_input_producer([imgs_tensor,labels_tensor])    #从队列中读取图片名称和标签名称 image_filename = filename_queue[0] label_filename = filename_queue[1]    #通过tf.read_file 将图片和gt的值读出来 imgs_values=tf.read_file(image_filename) label_values=tf.read_file(label_filename)   #对读出的值解码恢复成图片格式 imgs_decorded=tf.image.decode_png(imgs_values) labels_decorded=tf.image.decode_png(label_values)   #reshape成原本的形状 imgs_reshaped=tf.reshape(imgs_decorded,[FLAGS.img_height,FLAGS.img_width,3]) labels_reshaped=tf.reshape(labels_decorded,[FLAGS.img_height,FLAGS.img_width,1])   #转化数据类型 imgs_reshaped = tf.cast(imgs_reshaped, tf.float32)   #确定队列中最小数据量，一般取总样本量一定比例的数据，因此，当总样本量很大，选取的比例值要小一点，不然会导致最小数据量过大 min_fraction_of_examples_in_queue = FLAGS.fraction_of_examples_in_queue min_queue_examples = int(FLAGS.num_examples_epoch_train *min_fraction_of_examples_in_queue)   print ('Filling queue with %d input images before starting to train.This may take some time.' % min_queue_examples)   #train的时候打乱顺序  test的时候顺序保持不变 if FLAGS.train==True: images_batch, labels_batch = tf.train.shuffle_batch([imgs_reshaped,labels_reshaped],                                                    batch_size=FLAGS.batch_size,                                                    num_threads=6,                                                    capacity=min_queue_examples + 3 * FLAGS.batch_size,                                                    min_after_dequeue=min_queue_examples) if FLAGS.train==False: images_batch, labels_batch = tf.train.batch([imgs_reshaped, labels_reshaped],                                             batch_size=FLAGS.batch_size,                                             num_threads=6,                                             capacity=min_queue_examples + 3 * FLAGS.batch_size)

2)卷积核的初始化方式采取了 He.al 的方法
也就是initializer=tf.contrib.layers.variance_scaling_initializer()

3)常见的变量初始化方式
参考博客https://blog.csdn.net/zlrai5895/article/details/80550924

4)batch_normalization的使用
介绍：https://blog.csdn.net/hjimce/article/details/50866313
用处加快收敛、可以不使用dropout、L2正则项参数、可以不使用lrn

使用参考博客https://blog.csdn.net/candy_gl/article/details/79551149
https://blog.csdn.net/zlrai5895/article/details/80551528

需要注意的是，保存模型的时候并没有采用第二篇博客所提出的方式，具体可参考代码。

采用batch_normalization以后，计算loss和优化的时候需要

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): 	loss=tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(labels,squeeze_dims=3),logits=logits))         train_op=tf.train.AdamOptimizer(0.001).minimize(loss)

从pre_trained的权重恢复模型的时候

sess.run(tf.global_variables_initializer()) sess.run(tf.local_variables_initializer())

5)unpool_with_argmax层的编写
这一步要实现的功能是把pool的特征图上采样成pool之前大小的特征图，同时根据之前存储的pool_indices将最大值重新放回原来的位置。
关键的函数是tf.scatter_nd()
使用方法参考博客https://blog.csdn.net/zlrai5895/article/details/80551056

相关实现代码：

def unpool_with_argmax(pool, ind, output_shape,name = None, ksize=[1, 2, 2, 1]):       """        Unpooling layer after max_pool_with_argmax.        Args:            pool:   max pooled output tensor            ind:      argmax indices            ksize:     ksize is the SAME as for the pool        Return:            unpool:    unpooling tensor     """     with tf.variable_scope(name):         input_shape = pool.get_shape().as_list()      #输入tensor的shape             flat_input_size = np.prod(input_shape)        #tf.prod 把列表中的全部元素相乘         ind_=tf.cast(tf.reshape(ind,[flat_input_size,1]),tf.int32)           pool_=tf.reshape(pool,[flat_input_size])      #把索引和pool层中的元素全部flat         flat_output_shape =tf.constant([ output_shape[0]*output_shape[1] * output_shape[2] * output_shape[3]])  #输出的shape flat         ret= tf.scatter_nd(ind_, pool_, flat_output_shape)  #嵌入         ret = tf.reshape(ret, output_shape)   #reshape到输出的shape         return ret

6)lrn的使用：(局部响应归一化层)
参考博客 https://blog.csdn.net/yangdashi888/article/details/77918311
被证明了用处不大

7)最大池化的时候保留索引

tf.nn.max_pool_with_argmax()

8)tf.cond()的用法
在TensorFlow中，tf.cond()类似于c语言中的if...else...，用来控制数据流向，但是仅仅类似而已，其中差别还是挺大的。
代码：

z = tf.multiply(a, b)   result = tf.cond(x < y, lambda: tf.add(x, z), lambda: tf.square(y))

9)collection
tensorflow的collection提供一个全局的存储机制，不会受到变量名生存空间的影响。一处保存，到处可取。

tf.add_to_collection(name, value)  		#向collection中存数据  tf.Graph.get_collection(name, scope=None)       #从collection中获取数据

10)tf.one_hot()用于将label转换成one-hot的形式

11)instance segmentation其实是semantic segmentation和object detection殊途同归的一个结合点, 是个挺重要的研究问题. 非常期待后面能同时结合semantic segmentation和object detection两者优势的instance segmentation算法和网络结构.（Mask R-CNN等系列正在突破)

12)数组的奇异值分解
参考博客https://blog.csdn.net/u012162613/article/details/42214205
代码：

A=mat([[1,2,3],[4,5,6]])   from numpy import linalg as la   U,sigma,VT=la.svd(A)

13)numpy.prod

numpy.prod(a, axis=None, dtype=None, out=None, keepdims=<class 'numpy._globals._NoValue'>)

返回给定轴上的数组元素的乘积。

14)tf.train.slice_input_producer,tf.train.string_input_producer两种队列批量读取方式的比较
参考博客https://blog.csdn.net/qq_30666517/article/details/79715045

tf.train.string_input_producer(path),传入路径时，不需要放入list中。然后加载图片的reader是tf.WholeFileReader(),其他地方和tf.train.slice_input_producer()函数用法基本类似。

15)tf.convert_to_tensor

16)tf.concat
tf.concat是连接两个矩阵的操作

tf.concat(concat_dim, values, name='concat')

17)numpy.bincount(x)详解
其实就是返回索引值在x中出现的次数。
参考博客https://blog.csdn.net/xlinsist/article/details/51346523

18)numpy.diag()
返回一个矩阵的对角线元素，或者创建一个对角阵
参考 https://jingyan.baidu.com/article/59703552e03ce18fc0074005.html

19)预测结果的评价：
采用三个指标：总体精度、类精度、IOU(predict和label的交集/并集)
代码：

def predict_eval(predictions, label_tensor):     labels = label_tensor     num_class = FLAGS.num_class     size = predictions.shape[0]     hist = np.zeros((num_class, num_class))     for i in range(size):       hist += fast_hist(labels[i].flatten(), predictions[i].argmax(2).flatten(), num_class)     acc_total = np.diag(hist).sum() / hist.sum()     print ('accuracy = %f'%np.nanmean(acc_total))     iu = np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))     print ('mean IU  = %f'%np.nanmean(iu))     for ii in range(num_class):         if float(hist.sum(1)[ii]) == 0:           acc = 0.0         else:           acc = np.diag(hist)[ii] / float(hist.sum(1)[ii])         print("    class # %d accuracy = %f "%(ii,acc))   def fast_hist(a, b, n):     k = (a >= 0) & (a < n)     return np.bincount(n * a[k].astype(int) + b[k], minlength=n**2).reshape(n, n)

4、源代码：

代码地址：

5、实验效果

文章来源: 经典网络复现系列（二）：SegNet

标签

卷积

seg