论文笔记(关于图像检索的总结性论文):Content-Based Image Retrieval and Feature Extraction: A Comprehensive Review(上)

淺唱寂寞╮ 提交于 2020-03-04 02:07:38

放上引用:Latif, Afshan and Rasheed, Aqsa and Sajid, Umer and Jameel, Ahmed and Ali, Nouman and Ratyal, Naeem Iqbal and Zafar, Bushra and Dar, Saadat and Sajid, Muhammad and Khalil, Tehmina:Content-Based Image Retrieval and Feature Extraction: A Comprehensive Review,Mathematical Problems in Engineering,Mathematical Problems in Engineering

这是巴基斯坦的一个团队的研究论文,因为无意中看到其实还挺全面且详细的。一切论文都不是完全正确且最新的,这里就当和大家一起基于这篇论文重新整理一下关于Content-Based 图像检索和特征抽取的种种。然后也是为了跟着这篇文章的参考文献思路,各取所需。以下内容仅代表个人观点,有问题欢迎交流。

关于什么叫 content-based,参考以下论文:

Gudivada, Venkat N., and Vijay V. Raghavan. "Content-based image retrieval systems." Computer 28.9 (1995): 18-22.

 

先看abstract我们可以知道作者写这一篇文章的目的是:

We analyzed the main aspects of various image retrieval and image representation models from low-level feature extraction to recent semantic deep-learning approaches. )e important concepts and major research studies based on CBIR and image representation are discussed in detail, and future research directions are concluded to inspire further research in this area.

比起以前使用metadata以及图像描述的检索,近年CBIR的技术得到了发展,然后他们这篇论文呢,就是为了总结从低level的特征抽出图像表现到近年的基于深度学习的图像描述和检索技术,基于图像内容解析的研究,包括对未来这个领域走向的一些预想。

下面进入introduction:

作者说,现在很多检索是基于图像描述以及用户query的关键词匹配,比如以下几篇论文:

[4] S. Yang, L. Li, S. Wang, W. Zhang, Q. Huang, and Q. Tian,“SkeletonNet: a hybrid network with a skeleton-embedding process for multi-view image representation learning,” IEEETransactions on Multimedia, vol. 1, no. 1, 2019.

[5] W. Zhao, L. Yan, and Y. Zhang, “Geometric-constrained multi-view image matching method based on semi-global optimization,” Geo-Spatial Information Science, vol. 21, no. 2,pp. 115–126, 2018.

[6] W. Zhou, H. Li, and Q. Tian, “Recent advance in contentbased image retrieval: a literature survey,” 2017, https://arxiv.

org/abs/1706.06064.

其实我个人角度(关于[4]:地址:https://ieeexplore.ieee.org/document/8695120 ,我觉得这篇作者更多是做了个multi-view subspace learning的无监督学习的方法吧。[5]的话,主要是在加入多视角遥感图像的几何特征的图像匹配方法。[6]的话,主要是一个同这篇论文性质差不多的content-based image retrieval的总结,但是总结了2003-2016年的技术,个人还是比较推荐的,地址:https://arxiv.org/pdf/1706.06064.pdf

然后作者介绍了CBIR的基础概念和所使用的特征,然后作者叙述了特征选择的背景意义:

According to theliterature, the selection of visual features for any system is dependent on the requirements of the end user.

具体的特征选择还要看用户端的需求,为了提高检索效果可能很会消耗很高的计算成本:

[19] N. Ali, Image Retrieval Using Visual Image Features and Automatic Image Annotation, University of Engineering and

Technology, Taxila, Pakistan, 2016.

[20] B. Zafar, R. Ashraf, N. Ali et al., “Intelligent image classification-based on spatial weighted histograms of concentric

circles,” Computer Science and Information Systems, vol. 15, no. 3, pp. 615–633, 2018.

不正确的特征选择反而会影响系统的表现比如:

[12]L. Piras and G. Giacinto, “Information fusion in content based image retrieval: a comprehensive overview,” Information

Fusion, vol. 37, pp. 50–60, 2017.

然后作者也提了现在各种特征可以广泛被运用在机器学习和深度学习之中而收获好的效果:

ML:

[1] D. Zhang, M. M. Islam, and G. Lu, “A review on automatic image annotation techniques,” Pattern Recognition, vol. 45, no. 1, pp. 346–362, 2012.

[2] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of contentbased image retrieval with high-level semantics,” Pattern Recognition, vol. 40, no. 1, pp. 262–282, 2007.

DL(作者也吐槽了句计算消耗比较大):

[21] G. Qi, H. Wang, M. Haner, C. Weng, S. Chen, and Z. Zhu,“Convolutional neural network based detection and judgement of environmental obstacle in vehicle operation,” CAAI Transactions on Intelligence Technology, vol. 4, no. 2,pp. 80–91, 2019.

[22] U. Markowska-Kaczmar and H. Kwa´snicka, “Deep learning––a new era in bridging the semantic gap,” in Bridging the Semantic Gap in Image and Video Analysis, pp. 123–159, Springer, Basel, Switzerland, 2018.

[23] F. Riaz, S. Jabbar, M. Sajid, M. Ahmad, K. Naseer, and N. Ali,“A collision avoidance scheme for autonomous vehicles inspired by human social norms,” Computers & Electrical Engineering, vol. 69, pp. 690–704, 2018.

所以作者表示,这篇文章的一大目标就是综合总结分析一下各种各样的特征:底层特征(几何纹理色彩等)会怎样影响检索的效果?如何缩小图像底层表现和高层语意表现的沟壑?图像的空间布局对图像的检索和表现有多么重要?DL ,ML的导入会怎样的提高CBIR的表现?

然后作者介绍了下文章结构:

=================================================================================

Section 2 颜色特征

Section 3 纹理特征

Section 4 形状特征

Section 5 空间特征

Section 6 底层特征融合

Section 7  局部特征

Section 8 基于深度学习的检索

Section 9 关于人脸识别的特征抽出

Section 10 关于距离计算

Section 11 关于特征抽出和CBIR的评价标准

Section 12 关于相关技术的未来

=================================================================================

考虑到阅读的疲惫可能性,本笔记分上中下三部分构成,以上红色的内容在(上)部分放置

(以下内容对2015之后的论文引用会放上链接)

Section 2 关于颜色特征:

[24] H. Shao, Y. Wu, W. Cui, and J. Zhang, “Image retrieval based on MPEG-7 dominant color descriptor,” in Proceedings of the 9th International Conference for Young Computer Scientists ICYCS 2008, pp. 753–757, IEEE, Hunan, China,November 2008.

基于MPEG-7 descriptor,每个图选8个主色,然后基于直方图计算图像类似

[25] X. Duanmu, “Image retrieval using color moment invariant,”in Proceedings of the 2010 Seventh International Conference on Information Technology: New Generations (ITNG),pp. 200–203, IEEE, Las Vegas, NV, USA, April 2010.

用了HAC聚类颜色特征

[26] X.-Y. Wang, B.-B. Zhang, and H.-Y. Yang, “Content-basedimage retrieval by integrating color and texture features,”Multimedia Tools and Applications, vol. 68, no. 3, pp. 545–569, 2014.

用了纹理和颜色,然后距离计算和合并两种特征造成了难题(以及计算成本)

[27] H. Zhang, Z. Dong, and H. Shu, “Object recognition by acomplete set of pseudo-Zernike moment invariants,” in Proceedings of the 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 930–933, IEEE, Dallas, TX, USA, March 2010.

基于Zernike 和 pseudo-Zernike polynomials拟合的优化来解决缩放旋转问题

 

作者提到,颜色特征是一种很难被图像基础形变(旋转,缩放,平移等)所影响的特征,比如以下:

[28] J. M. Guo, H. Prasetyo, and J. H. Chen, “Content-based image retrieval using error diffusion block truncation coding features,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 3, pp. 466–481, 2015.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6898854

使用error diffusion block truncation coding (EDBTC)抽出特征,也就是抽出了颜色特征和bitmap特征后进行检索

[29] Y. Liu, D. Zhang, and G. Lu, “Region-based image retrieval with high-level semantics using decision tree learning,”Pattern Recognition, vol. 41, no. 8, pp. 2554–2570, 2008.

(虽然老,这篇想稍微推荐一下)这篇论文使用了决策树,

[30] M. M. Islam, D. Zhang, and G. Lu, “Automatic categorization of image regions using dominant color based vector quantization,” in Proceedings of the Digital Image Computing:Techniques and Applications, pp. 191–198, IEEE, Canberra,Australia, December 2008.

这篇是提出了一种基于颜色的量化方法

[31] Z. Jiexian, L. Xiupeng, and F. Yu, “Multiscale distance coherence vector algorithm for content-based image retrieval,”@e Scientific World Journal, vol. 2014, Article ID 615973,13 pages, 2014.(虽然个人觉得这篇主要是基于轮廓特征,然后经过一系列演算实现抗旋转等干扰)

然后作者总结,颜色特征虽然不能很好的表现局域特征,但是,相对很多区域特征,确实减少了计算消耗然后文中给出以上方法的检索效率:

 

在相同dataset上来看,【30】提出的颜色量化方法可以多关注关注。

接下来总结纹理特征:

[32] G. Papakostas, D. Koulouriotis, and V. Tourassis, “Feature extraction based on wavelet moments and moment invariants in machine vision systems,” in Human-Centric Machine Vision, InTech, London, UK, 2012.

基于小波矩和不变矩的特征抽出

[33] G.-H. Liu, Z.-Y. Li, L. Zhang, and Y. Xu, “Image retrieval based on micro-structure descriptor,” Pattern Recognition,vol. 44, no. 9, pp. 2123–2133, 2011.

这篇文章作者提出了一种micro-structures,把HSV色彩的特征,和边缘方向特征(用的Sobel operator)拿来定义了新的特征map

[34] X.-Y. Wang, Z.-F. Chen, and J.-J. Yun, “An effective method for color image retrieval based on texture,” Computer Standards & Interfaces, vol. 34, no. 1, pp. 31–35, 2012.

用 color co-occurrence matrix 抽出纹理特征

[40] N.-E. Lasmar and Y. Berthoumieu, “Gaussian copula multivariate modeling for texture image retrieval using wavelet transforms,” IEEE Transactions on Image Processing, vol. 23,no. 5, pp. 2246–2261, 2014.

这篇如标题 wavelet transforms

然后作者总结,因为纹理特征代表的是一个像素群,所以它比颜色特征要更加的具有语意上的意义,但是呢纹理特征有一点就是它对噪声很敏感。以上的检索效率如下图:

接下来讲一下形状特征:

[15] D. Zhang and G. Lu, “Review of shape representation and description techniques,” Pattern Recognition, vol. 37, no. 1,pp. 1–19, 2004.

这是一篇形状特征的总结论文 但是只到04年

然后作者根据以下两篇论文:

[14] D. Ping Tian, “A review on image feature extraction and representation techniques,” International Journal of Multimedia and Ubiquitous Engineering, vol. 8, no. 4, pp. 385–396, 2013.

[15] D. Zhang and G. Lu, “Review of shape representation and description techniques,” Pattern Recognition, vol. 37, no. 1,pp. 1–19, 2004.

总结出了这个表格:

[41] Z. Hong and Q. Jiang, “Hybrid content-based trademark retrieval using region and contour features,” in Proceedingsof the 22nd International Conference on Advanced Information Networking and Applications-Workshops AINAW2008, pp. 1163–1168, IEEE, Okinawa, Japan, March 2008.

这篇的话主要还是一种轮廓特征的表达。

然后空间特征:

一种常见方法就是:Bag of visual words(https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb)bag of words (BOW)是一种nlp的基于统计词频的方法,所以放在图像就是一种特征来表示一个词,

大概感觉就是引用链接里的这样,然后用这些特征来表现一张图像:

[42] N. Ali, K. B. Bajwa, R. Sablatnig et al., “A novel image retrieval based on visual words integration of SIFTand SURF,”PLoS One, vol. 11, no. 6, Article ID e0157428, 2016.

用SIFT (抗旋转)和 SUPF(抗光线)把图像表现成直方图。

然后还有一种就是Spatial Pyramid Matching(关于这个请看:http://slazebni.cs.illinois.edu/slides/ima_poster.pdf

相关的文献:

[43] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proceedings of the 2006 IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition—Volume 2 (CVPR’06), pp. 2169–2178, IEEE, New York, NY, USA, June 2006.(这个就是刚刚那个链接对应的论文)

[44] Z. Mehmood, S. M. Anwar, N. Ali, H. A. Habib, and M. Rashid, “A novel image retrieval based on a combination of local and global histograms of visual words,” Mathematical Problems in Engineering, vol. 2016, Article ID 8217250, 12 pages, 2016.

用了SIFT 特征,k-means聚类来做codebooks

[46] B. Zafar, R. Ashraf, N. Ali et al., “A novel discriminating and relative global spatial image representation with applications in CBIR,” Applied Sciences, vol. 8, no. 11, p. 2242, 2018.

用计算Bag of visual words间成对单词的global geometric relationship来对付transformation invariance 

[47] N. Ali, B. Zafar, F. Riaz et al., “A hybrid geometric spatial image representation for scene classification,” PLoS One,

vol. 13, no. 9, Article ID e0203339, 2018.

emmmm,个人愚见,这篇就是把图像分成圆形正方形三角形区域来抽取特征抽codebooks

[48] B. Zafar, R. Ashraf, N. Ali, M. Ahmed, S. Jabbar, and S. A. Chatzichristofis, “Image classification by addition ofspatial information based on histograms of orthogonal vectors,” PLoS One, vol. 13, no. 6, Article ID e0198175, 2018.

这篇用正交做了些位置表现的处理

[51] H. Anwar, S. Zambanini, and M. Kampel, “A rotation-invariant bag of visual words model for symbols based ancient coin classification,” in Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP),pp. 5257–5261, IEEE, Paris, France, October 2014.

[52] H. Anwar, S. Zambanini, and M. Kampel, “Efficient scaleand rotation-invariant encoding of visual words for image classification,” IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1762–1765, 2015.

[53] R. Khan, C. Barat, D. Muselet, and C. Ducottet, “Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model,” Computer Vision and Image Understanding, vol. 132, pp. 102–112, 2015.

[54] N. Ali, B. Zafar, M. K. Iqbal et al., “Modeling global geometric spatial information for rotation invariant classification of satellite images,”

以上几个都是各种优化来防治旋转缩放等问题

 

剩余的将会在下一篇博客(中)里更新,重点讲解

Section 6 底层特征融合

Section 7  局部特征

Section 8 基于深度学习的检索

这三块的内容,如果有什么错误欢迎指正留言~~

 

 

=======================================================

个人github:https://github.com/timcanby

 

 

 

 

 

 

 

 

 

 

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!