How does the Lowe's ratio test work?

后端 未结 3 431
鱼传尺愫
鱼传尺愫 2020-12-30 06:53

Suppose I have a set of N images and I have already computed the SIFT descriptors of each image. I know would like to compute the matches between the different features. I h

3条回答
  •  一整个雨季
    2020-12-30 07:50

    Short version: each keypoint of the first image is matched with a number of keypoints from the second image. We keep the 2 best matches for each keypoint (best matches = the ones with the smallest distance measurement). Lowe's test checks that the two distances are sufficiently different. If they are not, then the keypoint is eliminated and will not be used for further calculations.

    Long version:

    David Lowe proposed a simple method for filtering keypoint matches by eliminating matches when the second-best match is almost as good. Do note that, although popularized in the context of computer vision, this method is not tied to CV. Here I describe the method, and how it is implemented/applied in the context of computer vision.

    Let's suppose that L1 is the set of keypoints of image 1, each keypoint having a description that lists information about the keypoint, the nature of that info really depending on the descriptor algorithm that was used. And L2 is the set of keypoints for image 2. A typical matching algorithm will work by finding, for each keypoint in L1, the closest match in L2. If using Euclidian distance, like in Lowe's paper, this means the keypoint from set L2 that has the smallest Euclidian distance from the keypoint in L1.

    Here we could be tempted to just set a threshold and eliminate all the pairings where the distance is above that threshold. But it's not that simple because not all variables inside the descriptors are as "discriminant": two keypoints could have a small distance measurement because most of the variables inside their descriptors have similar values, but then those variables could be irrelevant to the actual matching. One could always add weighting to the variables of the descriptors so that the more discriminating traits "count" more. Lowe proposes a much simpler solution, described below.

    First, we match the keypoints in L1 with two keypoints in L2. Working from the assumption that a keypoint in image 1 can't have more than one equivalent in image 2, we deduce that those two matches can't both be right: at least one of them is wrong. Following Lowe's reasoning, the match with the smallest distance is the "good" match, and the match with the second-smallest distance the equivalent of random noise, a base rate of sorts. If the "good" match can't be distinguished from noise, then the "good" match should be rejected because it does not bring anything interesting, information-wise. So the general principle is that there needs to be enough difference between the best and second-best matches.

    How the concept of "enough difference" is operationalized is important: Lowe uses a ratio of the two distances, often expressed a such:

    if distance1 < distance2 * a_constant then ....
    

    Where distance1 is the distance between the keypoint and its best match, and distance2 is the distance between the keypoint and its second-best match. The use of a "smalled than" sign can be somewhat confusing, but that becomes obvious when taking into consideration that a smaller distance means that the point is closer. In OpenCV world, the knnMatch function will return the matches from best to worst, so the 1st match will have a smaller distance. The question is really "how smaller?" To figure that out we multiply distance2 by a constant that has to be between 0 and 1, thus decreasing the value of distance2. Then we look at distance1 again: is it still smaller than distance2? If it is, then it passed the test and will be added to the list of good points. If not, it must be eliminated.

    So that explans the "smaller than" part, but what about the multiplication? Since we are looking at the difference between the distances, why not just use an actual mathematical difference between distance1 and distance2? Although technically we could, the resulting difference would be in absolute terms, it would be too dependent on the variables inside the descriptors, the type of distance measurement that we use, etc. What if the code for extracting descriptions changes, affecting all distance measurements? In short, doing distance1 - distance2 would be less robust, would require frequent tweaking and would make methodological comparisons more complicated. It's all about the ratio.

    Take-away message: Lowe's solution is interesting not only because of it's simplicity, but because it is in many ways algorithm-agnostic.

提交回复
热议问题