问题
I have an application where given a reasonable amount of images (let's say 20K) and a query image, I want to find the most similar one. An reasonable approximation is feasible.
In order to guarantee precision in representing each image, I'm using SIFT (a parallel version, to achieve fast computation also).
Now, given the set of n
SIFT descriptors (where 500<n<1000
usually, depending on image size), which can be represented as a matrix n x 128
, from what I've seen in literature there are two possible approaches for my case:
- Descriptors matching: we map each descriptor vector to a low dimension space and we try to find an approximation of the most similar one, for example through LSH. Then, we increment the number of matches between the query image and the image relative to the similar descriptor found. We iterate the process on all the descritors. Finally, we return as result the image with the highest number of descriptors matches.
- Bag of Features: we create an histogram vector for each image follow the BoF model. Supposing that we use
k
-means (wherek=128
, for example), we obtain ak
-dimensions vectors for each image. Sincek
could be too large for efficient comparison, we can map it to a smaller (possibly binary) space through LSH again (as we did in 1.). Finally, as reslut we return the most similar histogram. Notice that a big problem of this approach is that, as I discussed in this question, in order to quickly define the histogram we need to use LSH again (what a mess!).
I'm surprised that I didn't find any comparison of these two approaches. My question is: what we have to consider for each one of them? There are researches of these two approches? The first method seems more efficient and it's feasible for such a dataset.
来源:https://stackoverflow.com/questions/37987863/similar-images-bag-of-features-visual-word-or-matching-descriptors