How to store Hierarchical K-Means tree for a large number of images, using Opencv?

安稳与你 提交于 2020-01-13 03:55:26

问题


I am trying to make a program that will find similar images from a dataset of images. The steps are

  • extract SURF descriptors for all images
  • store the descriptors
  • Apply knn on the stored descriptors
  • Match the stored descriptors to the query image descriptor using kNN

Now each images SURF descriptor will be stored as Hierarchical k-means tree, now do I store each tree as a separate file or is it possible to build some sort of single tree with all the images descriptors and updated as images are added to dataset.

This is the paper I am basing the program on.


回答1:


Use a KD-Tree instead. You will be able to build the hierarchical K-dimensional tree, you just need figure out what sort of information is sent down the tree to be stored. You can save the vectors/descriptors of the images onto the disk, load the KD-Tree every time you start-up your program. Newly computer vectors/descriptors can be both sent to the tree and the disk

To Summarize

  • Create descriptors
  • Feed the new descriptors to the KD-Tree
  • Save the same descriptors to the disk\
  • On every restart, load all the descriptors to the tree
  • Query the tree to give you the best match

Hope this helps




回答2:


Are you quite sure you want to do it with SURF descriptors? I'm just working on a similar application myself, and it is based on this paper (Nister, Stewenius) and they swear SIFT descriptors are the way to go. But, I guess you could do it with any other descriptors as well.

Looking at the paper you referenced, it is newer that the work I linked to, but it does not reference either the Nister paper or this work (Sivic, Zisserman), which are, to my knowledge, the base works for all content based image retrieval problems.

For the better understanding of the problem, before I started implementing it, I first read Sivic, Zisserman to get the general idea behind the system. They only apply a simple clustering after extracting all the SIFT descriptors from all the features. They use two different types of features for better accuracy, Shape Adapted (centered on corner like features) and Maximally Stable (corresponding to blobs of high contrast - you can look them up in this paper (Matas et. al.)). The scalability of their system is not all that good because of the direct storage of each feature but they introduced the concept of Inverted files, a technique from Text Analytics (you can read about it's basics here), which simplifies the look up process significantly.

After conquering that work, I recommend going on to Nister, Stewenius where they introduce the concept of hierarchical k-means clustering in L levels for storing the features, as well as for the latter search of the image database. Now, unless I am very very much mistaken, you don't store each descriptor as a separate tree. Instead, you make the tree based on the existing features (where the cluster centers in each level are actually the representative, "central", features for each cluster). Once the tree is constructed to the required depth (they recommend 10 cluster over 6 levels), the cluster centers at the last level are representative for a very small number of features - and thus, you can actually forget all the original features! (or at least, their descriptors). Each original feature can be represented by the corresponding cluster center, and instead of descriptors, for each image you only need to store the information about which cluster centers - features - it contains. This is much easier since you only need to store one or two integers per feature - coding it's path through a tree. The simplest way to see it is if you just code the number of the cluster that the feature belongs to at each level - there's 10 of them (4 bits) - for each level (6 of them, 4*6 < 32bits, so it fits into an integer). You can of course implement the actual coding any way you see fit. Oh, and they also use SIFT descriptors on MSER regions.

Also, if the images you use for building a vocabulary tree are representative (e.g. you're working on a dataset of open-space pictures and you build the tree from just a representative portion of the images, but you know there's no pictures of industrial plant workplaces in the rest of the dataset), you can add the new pictures very very fast. The only thing you need to do for adding any new picture in the dataset is determine which of the calculated cluster centers represent the image features the best (as mentioned before, the last level of cluster centers is quite precise) and store the information about the cluster centers (the before-mentioned integers). Looking up the cluster centers should be really fast - there's only 10 comparisons at each of the 6 levels.

Hope this is actually useful to somebody someday, since the question is just over a year old. :)



来源:https://stackoverflow.com/questions/5521438/how-to-store-hierarchical-k-means-tree-for-a-large-number-of-images-using-openc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!