OpenCV GPU Farneback Optical Flow badly works in multi-threading

我的梦境 提交于 2019-12-06 00:58:32

问题


My application uses the Opencv gpu class gpu::FarnebackOpticalFlow to compute the optical flow between a pair of consecutive frames of an input video. In order to speed-up the process, I exploited the TBB support of OpenCV to run the method in multi-threading. However, the multi-threading performance does not behave like the single-threaded one. Just to give you an idea of the different behaviour, here are two snapshots, respectively of the single threaded and the multi threaded implementation.

The multi-threaded implementation assumes to split the image in 8 different stripes (the number of cores on my pc), and the gpu method for the Farneback implementation of the optical flow is applied on each of them. Here are the corresponding code lines for both methods:

Single-threaded implementation

/* main.cpp */
//prevImg and img are the input Mat images extracted from the input video
...
GpuMat gpuImg8U(img);
GpuMat gpuPrevImg8U(prevImg);   
GpuMat u_flow, v_flow;
gpu::FarnebackOpticalFlow farneback_flow;
farneback_flow.numLevels = maxLayer;
farneback_flow.pyrScale = 0.5;
farneback_flow.winSize = windows_size;
farneback_flow.numIters = of_iterations;
farneback_flow(gpuPrevImg8U,gpuImg8U,u_flow,v_flow);
getFlowField(Mat(u_flow),Mat(v_flow),optical_flow);

...
}

void getFlowField(const Mat& u, const Mat& v, Mat& flowField){    
    for (int i = 0; i < flowField.rows; ++i){
        const float* ptr_u = u.ptr<float>(i);
        const float* ptr_v = v.ptr<float>(i);
        Point2f* row = flowField.ptr<Point2f>(i);

        for (int j = 0; j < flowField.cols; ++j){
            row[j].y = ptr_v[j];
            row[j].x = ptr_u[j];
        }
    }
}

Multi-threaded implementation

/* parallel.h */
class ParallelOpticalFlow : public cv::ParallelLoopBody {

    private:
        int coreNum;
        cv::gpu::GpuMat img, img2;
        cv::gpu::FarnebackOpticalFlow& farneback_flow;
        const cv::gpu::GpuMat u_flow, v_flow;
        cv::Mat& optical_flow;

    public:
        ParallelOpticalFlow(int cores, cv::gpu::FarnebackOpticalFlow& flowHandler, cv::gpu::GpuMat img_, cv::gpu::GpuMat img2_, const cv::gpu::GpuMat u, const cv::gpu::GpuMat v, cv::Mat& of)
                    : coreNum(cores), farneback_flow(flowHandler), img(img_), img2(img2_), u_flow(u), v_flow(v), optical_flow(of){}

        virtual void operator()(const cv::Range& range) const;

};


/* parallel.cpp*/
void ParallelOpticalFlow::operator()(const cv::Range& range) const {

    for (int k = range.start ; k < range.end ; k ++){

        cv::gpu::GpuMat img_rect(img,cv::Rect(0,img.rows/coreNum*k,img.cols,img.rows/coreNum));
        cv::gpu::GpuMat img2_rect(img2,cv::Rect(0,img2.rows/coreNum*k,img2.cols,img2.rows/coreNum));
        cv::gpu::GpuMat u_rect(u_flow,cv::Rect(0,u_flow.rows/coreNum*k,u_flow.cols,u_flow.rows/coreNum));
        cv::gpu::GpuMat v_rect(v_flow,cv::Rect(0,v_flow.rows/coreNum*k,v_flow.cols,v_flow.rows/coreNum));
        cv::Mat of_rect(optical_flow,cv::Rect(0,optical_flow.rows/coreNum*k,optical_flow.cols,optical_flow.rows/coreNum));

        farneback_flow(img_rect,img2_rect,u_rect,v_rect);
        getFlowField(Mat(u_rect),Mat(v_rect),of_rect);
    }
}

/* main.cpp */

    parallel_for_(Range(0,cores_num),ParallelOpticalFlow(cores_num,farneback_flow,gpuPrevImg8U,gpuImg8U,u_flow,v_flow,optical_flow));

The codes look like equivalent in the two cases. Can anyone explain me why there are these different behaviours? Or if there are some mistakes in my code? Thanks in advance for your answers


回答1:


GPU module is not thread-safe. It uses some global variables, like __constant__ memory and texture reference API, which can lead to data race if used in multi-threaded environment.



来源:https://stackoverflow.com/questions/34990228/opencv-gpu-farneback-optical-flow-badly-works-in-multi-threading

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!