问题
I am trying to track an object using correlation. I am finding a smaller patch in a larger image, frame by frame. For this, I am finding the shift in the patch, and where correlation is maximum, update the patch with a new patch.
My code is:
cv::Mat im_float_2,imagePart_out;
cv::Mat im_floatBig;
cv::Scalar im1_Mean, im1_Std, im2_Mean, im2_Std;
double covar, correl;
int n_pixels;
void computeShift()
{
int maxRow=0, maxCol=0, TX, TY;
double GMAX=0;
Mat image_window = Mat::zeros(imagePart.rows, imagePart.cols, CV_32F);
imagePart.convertTo(im_float_2, CV_32F);
imageBig.convertTo(im_floatBig,CV_32F);
for(maxRow=0; maxRow<=imageBig.rows-image_window.rows;maxRow++)
{
for(maxCol=0; maxCol<imageBig.cols-image_window.cols;maxCol++)
{
image_window = im_floatBig( cv::Rect( maxCol, maxRow,
image_window.cols, image_window.rows ) );
n_pixels = image_window.rows * image_window.cols;
// Compute mean and standard deviation of both images
meanStdDev(image_window, im1_Mean, im1_Std);
meanStdDev(im_float_2, im2_Mean, im2_Std);
// Compute covariance and correlation coefficient
covar = (image_window - im1_Mean).dot(im_float_2 - im2_Mean) / n_pixels;
correl = covar / (im1_Std[0] * im2_Std[0]);
if (correl > GMAX)
{
GMAX = correl; TX = maxRow; TY=maxCol;
image_window.convertTo(imagePart, CV_8UC1);
}
}
}
cvtColor(imagePart, imagePart_out, CV_GRAY2BGR);
printf("\nComputed shift: [%d, %d] MAX: %f\n", TX, TY,GMAX);
}
But when executing this I am getting very low FPS(1-2) even for small video size (Frame size-262x240
, Patch size- 25x25
).
Is there any way to achieve higher FPS. I am also looking in the direction of phase correlation, but not sure how to go about it from here. Can converting it to frequency domain will help?
For now, I want to optimize the above code for speed.
回答1:
Yes, you will likely gain from using the FFT. Simply pad im_float_2
to the size of im_floatBig
. Multiplying in the Fourier domain after taking the complex conjugate of one of the transforms leads to the cross-correlation, which is not the same as your correl
value (there is no division by the standard deviations happening). But I don't think you actually need to normalize by the standard deviations to get a good template matching. The cross-correlation works really well by itself. The location of maximum in the result can be translated to a displacement of the template w.r.t. the image.
The steps for cross-correlation through the FFT are:
- Pad the template (floating image) to the size of the other image (with zeros).
- Compute the FFT of both.
- Flip the sign of the imaginary component of one of the results (complex conjugate).
- Multiply the two.
- Compute the IFFT of the result.
- Find the location of the pixel with the largest value.
The location of this pixel indicates the translation of the padded template w.r.t. the other image. If they best match without translation, the max pixel will be at (x,y)=(0,0). If it is at (1,0) it indicates a one-pixel shift along x. What the direction is depends on which of the two you computed the complex conjugate for. Note that this result is periodic, a one-pixel shift in the opposite direction is indicated by the max pixel being on the right edge of the image. Simply experiment a bit to determine how to translate the location to a shift of your template.
Regarding your code:
meanStdDev(im_float_2, im2_Mean, im2_Std);
is computed in the loop, even thoughim_float_2
doesn't change.But you could get away with not normalizing by it anyway, since you're just looking for the maximum correlation, and dividing all values in your search by the same number doesn't change which one is the largest. The same applies to the division by
n_pixels
.Move
image_window.convertTo(imagePart, CV_8UC1)
outside the loop. It is likely that you update your current max many times before you finally find the actual max. There is no point in converting so many sub-windows toCV_U8
, if you only end up using the last one. Inside the loop you update the (x,y) coordinates of the max. Cast the final location only.You probably don't need to search the whole image for your template. It is likely that the object moves only a relatively small amount. You should look only in a small region around the previous known location. This concept is applicable to the FFT method as well: crop out a region of your big image, and pad your template to that size. A smaller FFT is cheaper to compute.
OpenCV stores images row-wise. Put the loop over the rows as the inner loop to optimize your cache usage.
来源:https://stackoverflow.com/questions/52753902/tackle-low-fps-for-correlation-code-to-compute-shift-in-image