Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images. the images conform to the pinhole camera model.

As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.

I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).

Then I calculate the essential matrix by the camera intrinsic matrix (K):

Mat E = K.t() * F * K;

I decompose the essential matrix to rotation and translation with singular value decomposition:

SVD decomp = SVD(E);
Matx33d W(0,-1,0,
          1,0,0,
          0,0,1);
Matx33d Wt(0,1,0,
          -1,0,0,
           0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3

Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).

The new position is then calculated with:

new_pos = old_pos + -R.t()*t;

where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).

Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.

Here are some results (just in case someone can confirm that any of them is definitely wrong):

F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
     1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
     -0.001052930954975217, -0.001278667878010564, 1]

K = [150, 0, 300;
    0, 150, 400;
    0, 0, 1]

E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
     0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
     -0.04396975675562629, -0.05262169424538553, 0.04904210357279387]

t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]

R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
     -0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
     -0.4503860776474556, 0.8236506374002566, 0.3446041331317597]

First of all you should check if

x' * F * x = 0

for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.

Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this

xn = inv(K) * x
xn' = inv(K') * x'

where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.

With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.

Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.

When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this

xp = K * P * X
xp' = K' * P' * X

where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.

I think the

new_pos = old_pos + -R.t()*t;

is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.

So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by

error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2

If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.

There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.

To update your camera position, you have to update the translation first, then update the rotation matrix.

t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;

where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

来源：https://stackoverflow.com/questions/16639106/camera-motion-from-corresponding-images

标签

OpenCV

rotation

translation

motion

3d-reconstruction