问题
In my main.cpp I have an excerpt:
Ptr<FastFeatureDetector> fastDetector = FastFeatureDetector::create(80, true);
while (true) {
Mat image = // get grayscale image 1280x720
timer.start();
detector->detect(image, keypoints);
myfile << "FAST\t" << timer.end() << endl; // timer.end() is how many seconds elapsed since last timer.start()
keypoints.clear();
timer.start();
for (int i = 3; i < image.rows - 3; i++)
{
for (int j = 3; j < image.cols - 3; j++)
{
if (inspectPoint(image.data, image.cols, i, j)) {
// this block is never entered
KeyPoint keypoint(i, j, 3);
keypoints.push_back(keypoint);
}
}
}
myfile << "Custom\t" << timer.end() << endl;
myfile << endl;
myfile.flush();
...
}
myfile is saying:
FAST 0.000515495
Custom 0.00221361
FAST 0.000485697
Custom 0.00217653
FAST 0.000490001
Custom 0.00219044
FAST 0.000484373
Custom 0.00216329
FAST 0.000561184
Custom 0.00233214
So one would expect that inspectPoint()
is a function that is actually doing something.
bool inspectPoint(const uchar* img, int cols, int i, int j) {
uchar p = img[i * cols + j];
uchar pt = img[(i - 3)*cols + j];
uchar pr = img[i*cols + j + 3];
uchar pb = img[(i + 3)*cols + j];
uchar pl = img[i*cols + j - 3];
return cols < pt - pr + pb - pl + i; // just random check so that the optimizer doesn't skip any calculations
}
I am using Visual Studio 2013 and the optimization is set to "Full Optimization (/Ox)".
As far as I know, FAST algorithm goes through all pixels? I suppose it is not possible that it actually processes each pixel faster than the function inspectPoint()
.
How is FAST detector so fast? Or rather, why is the nested loop so slow?
回答1:
From a quick browsing of the source code it looks like there is extensive optimization for SSE and OpenCL in fastFeatureDetector: github.com/Itseez/opencv/blob/master/modules/features2d/src/
SSE and OpenCL are not specific to any CPU. SSE utilizes the CPU's ability to perform a single instruction (calculation) on multiple pieces of data simultaneously. So depending on the CPU's architecture this can improve speeds as little as 2x or well beyond 4x. OpenCL can utilize the GPU which can also give major performance boosts to certain image processing operations.
来源:https://stackoverflow.com/questions/36749923/opencv-fast-detector