Not getting what 'spatial weights' for HOG are

问题

I am using HOG for sunflower detection. I understand most of what HOG is doing now, but have some things that I do not understand in the final stages. (I am going through the MATLAB code from Mathworks).

Let us assume we are using the Dalal-Triggs implementation. (That is, 8x8 pixels make 1 cell, 2x2 cells make 1 block, blocks are taken at 50% overlap in both directions, and lastly, that we have quantized the histograms into 9 bins, unsigned. (meaning, from 0 to 180 degrees)). Finally, our image here is 64x128 pixels.

Let us say that we are on the first block. This block has 4 cells. I understand that we are going to weight the orientations of each of the orientations by their magnitude. I also understand that we are going to weight them further, by a gaussian centered on the block.

So far so good.

However in the MATLAB implementation, they have an additional step, whereby they create a 'spatial' weight:

If we dive into this function, it looks like this:

Finally, the function 'computeLowerHistBin' looks like this:

function [x1, b1] = computeLowerHistBin(x, binWidth)
% Bin index
width    = single(binWidth);
invWidth = 1./width;
bin      = floor(x.*invWidth - 0.5);

% Bin center x1
x1 = width * (bin + 0.5);

% add 2 to get to 1-based indexing
b1 = int32(bin + 2);
end

Now, I believe that those 'spatial' weights are being used during the tri-linear interpolation part later on... but what I do not get is just how exactly they are being computed, or the logic behind that code. I am completely lost on this issue.

Note: I understand the need for the tri-linear interpolation, and (I think) how it works. What I do not understand is why we need those 'spatial weights', and what the logic behind their computation here is.

Thanks.

回答1:

This code is pre-computing the spatial weights for the trilinear interpolation. Take a look at the equation here for trilinear interpolation:

HOG Trilinear Interpolation of Histogram Bins

There you see things like (x-x1)/bx, (y-y1)/by, (1 - (x-x1)/bx), etc. In the code, wx1 and wy1 correspond to:

wx1 = (1 - (x-x1)/bx)
wy1 = (1 - (y-y1)/by)

Here, x1 and y1 are centers of the histogram bins for the X and Y directions. It's easier to describe these things in 1D. So in 1D, a value x will fall between 2 bin centers x1 <= x < x2. It doesn't matter exactly bin (1 or 2) it belongs. The important thing is to figure out the fraction of x that belongs to x1, the rest belongs to x2. Using the distance from x to x1 and dividing by the width of the bin gives a percentage distance. 1 minus that is the fraction that belongs to bin 1. So if x == x1, wx1 is 1. And if x == x2, wx1 is zero because x2 - x1 == bx (the width of a bin).

Going back to the code that creates the 4 matrices is just pre-computing all the multiplications of the weights needed for the interpolation of all the pixels in a HOG block. That is why it is a matrix of weights: each element in the matrix if for one of the pixels in the HOG block.

For example, you look at the equation for the wieghts for h(x1, y2, ~) you'll see these 2 weights for x and y (ignoring the z component).

(1 - (x-x1)/bx) * ((y-y1)/by)

Going back to the code, this multiplication is pre-computed for every pixel in the block using:

weights.x1y2 = (1-wy1)' * wx1;

where

(1-wy1) == (y - y1)/by

The same logic applies to the other weight matrices.

As for the code in "computeLowerHistBin", it's just finding the x1 in the trilinear interpolation equation, where x1 <= x < x2 (same for y1). There are probably a bunch of ways to solve this problem given a pixel location x and the width of a bin bx as long as you satisfy x1 <= x < x2.

For example, "|" indicate bin edges. "o" are the bin centers.

-20             0              20               40
 |------o-------|-------o-------|-------o-------|
       -10              10              30

if x = [2 9 11], the lower bin center x1 is [-10 -10 10].

回答2:

The idea here is that each pixel contributes not only to its own histogram cell, but also to the neighboring cell to some degree. These contributions are weighed differently, depending on how close the pixel is to the edge of the cell. The closer you are to an edge of your cell, the more you contribute to the corresponding neighboring cell, and the less you contribute to your own cell.

来源：https://stackoverflow.com/questions/26344764/not-getting-what-spatial-weights-for-hog-are

标签

matlab

image-processing

computer-vision

feature-extraction

matlab-cvst