Extracting VLAD from SIFT Descriptors in VLFeat with Matlab

拥有回忆 提交于 2019-12-04 16:46:06

First, you need to obtain a dictionary of visual words, or to be more specific: cluster the SIFT features of all images using k-means clustering. In [1], a coarse clustering using e.g. 64, or 256 clusters is recommended.

For that, we have to concatenate all descriptors into one matrix, which we can then pass to the vl_kmeans function. Further, we convert the descriptors from uint8 to single, as the vl_kmeans function requires the input to be either single or double.

all_descr = single([sift_descr{:}]);
centroids = vl_kmeans(all_descr, 64);

Second, you have to create an assignment matrix, which has the dimensions NumberOfClusters-by-NumberOfDescriptors, which assigns each descriptor to a cluster. You have a lot of flexibility in creating this assignment matrix: you can do soft or hard assignments, you can use simple nearest neighbor search or kd-trees or other approximate or hierarchical nearest neighbor schemes at your discretion.

In the tutorial, they use kd-trees, so let's stick to that: First, a kd-tree has to be built. This operation belongs right after finding the centroids:

kdtree = vl_kdtreebuild(centroids);

Then, we are ready to construct the VLAD vector for each image. Thus, we have to go through all images again, and calculate their VLAD vector independently. First, we create the assignment matrix exactly as described in the tutorial. Then, we can encode the SIFT descriptors using the vl_vlad function. The resulting VLAD vector will have the size NumberOfClusters * SiftDescriptorSize, i.e. 64*128 in our example..

enc = zeros(64*128, numel(sift_descr));

for k=1:numel(sift_descr)

    % Create assignment matrix
    nn = vl_kdtreequery(kdtree, centroids, single(sift_descr{k}));
    assignments = zeros(64, numel(nn), 'single');
    assignments(sub2ind(size(assignments)), nn, 1:numel(nn))) = 1;

    % Encode using VLAD
    enc(:, k) = vl_vlad(single(sift_descr{k}), centroids, assignments);
end

Finally, we have the high-dimensional VLAD vectors for all images in the database. Usually, you'll want to reduce the dimensionality of the VLAD descriptors e.g. using PCA.

Now, given new image which is not in the database, you can extract the SIFT features using vl_sift, create the assignment matrix with vl_kdtreequery, and create the VLAD vector for that image using vl_vlad. So, you don't have to find new centroids or create a new kd-tree:

% Load image and extract SIFT features
new_image = imread('filename.jpg');
new_image = single(rgb2gray(new_image));
[~, new_sift] = vl_sift(new_image);

% Create assignment matrix
nn = vl_kdtreequery(kdtree, centroids, single(new_sift));
assignments = zeros(64, numel(nn), 'single');
assignments(sub2ind(size(assignments)), nn, 1:numel(nn))) = 1;

% Encode using VLAD
new_vlad = vl_vlad(single(new_sift), centroids, assignments);

[1] Arandjelovic, R., & Zisserman, A. (2013). All About VLAD. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1578–1585. https://doi.org/10.1109/CVPR.2013.207

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!