Accelerating OpticalFlow Algorithm - OpenCV

隐身守侯 提交于 2019-12-06 07:30:46

Optical flow estimation in general is a quiet time consuming operation. I would suggest to change the optical flow method.

The DualTVL1OpticalFlow is a more performant method in OpenCV you can use. If this method is still to slow the calcOpticalFlowPyrLK should be used. However this method is a sparse motion estimation method and do not directly return a dense motion field. To do so: initialize a set of points on a grid of your frame (e.g. grid step = 10) use these points to track them with the calcOpticalFlowPyrLK. The differenz between the tracked and inital points gives you the optical flow at each grid position. Finally you have to interpolate between the grid points. E.g. use a nearest neighbour or linear interpolation.

A. Sarid

First, I want to say thanks for this answer below that I used in order to build my final solution that I will explain with as many details as I can.

My solution is divided into two parts:

  1. Multithreading - Splitting each frame into 4 matrices, each quarter in a different matrix. Creating 4 threads and running each quarter processing in a different thread. I created the 4 quarters matrices such that there will be some (5%) overlap between them so that I won't lose the connecting between them (see figure below - yellow part is 55% from width and 55% from height).

    Q1 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(0, WIDTH_RES*0.55));
    Q2 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(WIDTH_RES*0.45, WIDTH_RES));
    Q3 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(0, WIDTH_RES*0.55));
    Q4 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(WIDTH_RES*0.45, WIDTH_RES));
    

    Each thread is doing the optical flow processing (part 2 below) on a quarter and the main loop is waiting for all threads to finish in order to collect the results and averaging.

  2. Using a sparse method - Using calcOpticalFlowPyrLK method within a selected ROI grid instead of using calcOpticalFlowFarneback. Using Lucas-Kanade sparse method instead of the Farneback dense method is consuming much less CPU time. In my case I created a grid with gridstep=10. This is the simple function for creating the grid:

    void createGrid(vector<cv::Point2f> &grid, int16_t wRes, int16_t hRes, int step){
    for (int i= 0; i < wRes ; i+=step)
        for (int j= 0; j < hRes; j+=step)
            grid.push_back(cv::Point2f(i,j));
    }
    

    Note that if the grid is constant during the whole run, it is better to only create it once before entering the main loop.

After implementing both parts, when running the program, all 4 cores of the Odroid U3 were constantly working on 60%-80% and the performance were accelerated.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!