I am working on a project for estimating a UAV location using optical-flow algorithm. I am currently using cv::calcOpticalFlowFarneback
for this purpose.
My hardware is an Odroid U3 that will finally be connected to the UAV flight controller.
The problem is that this method is really heavy for this hardware and I am looking for some other ways to optimize / accelerate it.
Things that I've already tried:
- Reducing resolution to 320x240 or even 160x120.
- Using OpenCV TBB (compiled using
WITH_TBB=ON BUILD_TBB=ON
and adding-ltbb
). - Changing optical-flow parameters as suggested here
Adding the relevant part of my code:
int opticalFlow(){
// capture from camera
VideoCapture cap(0);
if( !cap.isOpened() )
return -1;
// Set Resolution - The Default Resolution Is 640 x 480
cap.set(CV_CAP_PROP_FRAME_WIDTH,WIDTH_RES);
cap.set(CV_CAP_PROP_FRAME_HEIGHT,HEIGHT_RES);
Mat flow, cflow, undistortFrame, processedFrame, origFrame, croppedFrame;
UMat gray, prevgray, uflow;
currLocation.x = 0;
currLocation.y = 0;
// for each frame calculate optical flow
for(;;)
{
// take out frame- still distorted
cap >> origFrame;
// Convert to gray
cvtColor(origFrame, processedFrame, COLOR_BGR2GRAY);
// rotate image - perspective transformation
rotateImage(processedFrame, gray, eulerFromSensors.roll, eulerFromSensors.pitch, 0, 0, 0, 1, cameraMatrix.at<double>(0,0),
cameraMatrix.at<double>(0,2),cameraMatrix.at<double>(1,2));
if( !prevgray.empty() )
{
// calculate flow
calcOpticalFlowFarneback(prevgray, gray, uflow, 0.5, 3, 10, 3, 3, 1.2, 0);
uflow.copyTo(flow);
// get average
calcAvgOpticalFlow(flow, 16, corners);
/*
Some other calculations
.
.
.
Updating currLocation struct
*/
}
//break conditions
if(waitKey(1)>=0)
break;
if(end_run)
break;
std::swap(prevgray, gray);
}
return 0;
}
Notes:
- I've ran
callgrind
and the bottleneck is as expected thecalcOpticalFlowFarneback
function. - I checked the CPU cores load while running the program, and it is not using all 4 cores heavily, only one core is on 100% at a given time (even with TBB):
Optical flow estimation in general is a quiet time consuming operation. I would suggest to change the optical flow method.
The DualTVL1OpticalFlow
is a more performant method in OpenCV you can use. If this method is still to slow the calcOpticalFlowPyrLK
should be used. However this method is a sparse motion estimation method and do not directly return a dense motion field.
To do so: initialize a set of points on a grid of your frame (e.g. grid step = 10) use these points to track them with the calcOpticalFlowPyrLK
. The differenz between the tracked and inital points gives you the optical flow at each grid position. Finally you have to interpolate between the grid points. E.g. use a nearest neighbour or linear interpolation.
First, I want to say thanks for this answer below that I used in order to build my final solution that I will explain with as many details as I can.
My solution is divided into two parts:
Multithreading - Splitting each frame into 4 matrices, each quarter in a different matrix. Creating 4 threads and running each quarter processing in a different thread. I created the 4 quarters matrices such that there will be some (5%) overlap between them so that I won't lose the connecting between them (see figure below - yellow part is 55% from width and 55% from height).
Q1 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(0, WIDTH_RES*0.55)); Q2 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(WIDTH_RES*0.45, WIDTH_RES)); Q3 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(0, WIDTH_RES*0.55)); Q4 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(WIDTH_RES*0.45, WIDTH_RES));
Each thread is doing the optical flow processing (part 2 below) on a quarter and the main loop is waiting for all threads to finish in order to collect the results and averaging.
Using a sparse method - Using
calcOpticalFlowPyrLK
method within a selected ROI grid instead of usingcalcOpticalFlowFarneback
. Using Lucas-Kanade sparse method instead of the Farneback dense method is consuming much less CPU time. In my case I created a grid withgridstep=10
. This is the simple function for creating the grid:void createGrid(vector<cv::Point2f> &grid, int16_t wRes, int16_t hRes, int step){ for (int i= 0; i < wRes ; i+=step) for (int j= 0; j < hRes; j+=step) grid.push_back(cv::Point2f(i,j)); }
Note that if the grid is constant during the whole run, it is better to only create it once before entering the main loop.
After implementing both parts, when running the program, all 4 cores of the Odroid U3 were constantly working on 60%-80% and the performance were accelerated.
来源:https://stackoverflow.com/questions/37507645/accelerating-opticalflow-algorithm-opencv