问题
I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.
For this reasons I would like to try what the author of SIFT said to be pretty good: voting.
I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:
- Location [x, y] (someone says Traslation)
- Scale
- Orientation
While with opencv is easy to get the match scale
and orientation
with:
cv::Keypoints.octave
cv::Keypoints.angle
I am having hard time to understand how I can calculate the location.
I have found an interesting slide where with only one match
we are able to draw a bounding box:
But I don't get how I could draw that bounding box with just one match. Any help?
回答1:
You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy)
, scale change ds
, and rotation d_theta
.
Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1)
be the location of f1 in image 1, let s1
be its scale, and let theta1
be it's orientation. Similarly you have (x2,y2)
, s2
, and theta2
for f2.
The translation between two features is (dx,dy) = (x2-x1, y2-y1)
.
The scale change between two features is ds = s2 / s1
.
The rotation between two features is d_theta = theta2 - theta1
.
So, dx
, dy
, ds
, and d_theta
are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.
Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.
回答2:
When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either (w/2,h/2)
or with the help of central moments).
E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a vector<{a,b}>
. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid.
votemap(feature.x - a*, feature.y - b*)+=1
where a,b corresponds to this particular feature vector.
If some of those features cast successfully at the same point (clustering is essential), you have found an object instance.
Signature and voting are reverse procedures. Let's assume V=(-20,-10)
. So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be V'=(+20*0.5*cos(-10),+10*0.5*sin(-10))
away from the SIFT feature because it is in half size and rotated by -10 degrees.
回答3:
To complete Dima's , one needs to add that the 4D Hough space is quantized into a (possibly small) number of 4D boxes, where each box corresponds to the simiéarity given by its center.
Then, for each possible similarity obtained via a tentative matching of features, add 1 into the corresponding box (or cell) in the 4D space. The output similarity is given by the cell with the more votes.
In order to computethe transform from 1 match, just use Dima's formulas in his answer. For several pairs of matches, you may need to use some least squares fit.
Finally, the transform can be applied with the function cv::warpPerspective()
, where the third line of the perspective matrix is set to [0,0,1]
.
来源:https://stackoverflow.com/questions/15938793/sift-matches-and-recognition