Camera pose estimation: How do I interpret rotation and translation matrices?

问题

Assume I have good correspondences between two images and attempt to recover the camera motion between them. I can use OpenCV 3's new facilities for this, like this:

 Mat E = findEssentialMat(imgpts1, imgpts2, focal, principalPoint, RANSAC, 0.999, 1, mask);

 int inliers = recoverPose(E, imgpts1, imgpts2, R, t, focal, principalPoint, mask);

 Mat mtxR, mtxQ;
 Mat Qx, Qy, Qz;
 Vec3d angles = RQDecomp3x3(R, mtxR, mtxQ, Qx, Qy, Qz);

 cout << "Translation: " << t.t() << endl;
 cout << "Euler angles [x y z] in degrees: " << angles.t() << endl;

Now, I have trouble wrapping my head around what R and t actually mean. Are they the transform needed to map coordinates from camera space 1 to camera space two, as in p_2 = R * p_1 + t?

Consider this example, with ground-truth manually labeled correspondences

The output I get is this:

Translation: [-0.9661243151855488, -0.04921320381132761, 0.253341406362796]
Euler angles [x y z] in degrees: [9.780449804801876, 46.49315494782735, 15.66510133665445]

I try to match this to what I see in the image and come up with the interpretation, that [-0.96,-0.04,0.25] tells me, I have moved to the right, as the coordinates have moved along the negative x-Axis, but it would also tell me, I have moved further away, as the coordinates have moved along the positive z-Axis.

I have also rotated the camera around the y-Axis (to the left, which I think would be a counter-clockwise rotation around the negative y-Axis because in OpenCV, the y-Axis points downwards, does it not?)

Question: Is my interpretation correct and if no, what is the correct one?

回答1:

It turns out my interpretation is correct, the relation p2 = R * p1 + t does indeed hold. One can verify this by using cv::triangulatePoints() and cv::convertPointsFromHomogeneous to obtain 3D coordinates from corresponding points (relative to camera 1) and then applying the above equation. Multiplication with camera 2's camera matrix then yields the p2 image coordinates.

回答2:

Your interpretation sounds about right to me. I'm not 100% about the orientation of axis in OpenCV but I believe you are correct about the Y axis.

The Output makes sense as well, not just from a code perspective but if you look at the two images, you can roughly imagine where a full 90 degree rotation would point (it would essentially be the same angle but on the opposite side of the car)

This is a pretty decent explanation of the concept via Rigid Body Motion mechanics too: http://nghiaho.com/?page_id=671

回答3:

Actually your interpretation is correct.

First of all you are correct, about the orientation of the y axis. For an illustration of the camera coordinate system of OpenCV see here.

Your code will return the R and t from the second to the first camera. This mean if x1 is a point in the first image and x2 is a point in the second image, the following equation holds x1 = R*x2 + t. Now in your case the right image (front view) is from camera 1 and the left image (side view) of the car from camera 2.

Looking at this equation we, see that first of all the rotation is applied. So image your camera currently pictures the left frame. Now your R specifies a rotation of about 46 degrees around the y axis. As rotation points by angle alpha is the same as counterrotating the coordinate axis by this angle, your R tells you to rotate left. As you yourself point out this seems correct if looking at the pictures. As the rotations around the other axes are small and hard to image, lets omit them here. So after applying the rotation you are still standing at the same position the left frame was taken from but your camera more or less points to the back of the car or the space directly behind the car.

Now let us look at the translation vector. Your interpretation about moving to the right and further away is correct as well. Let me try to explain why. Imagine from your current position, with the new camera direction you only move to the right. you would directly bump into the car or need to hold the camera above its engine hood. So after moving to the right you also need to move further away to reach the position you took the right picture from.

I hope this explanation helped you to imagine the movement your R and t describe.

回答4:

Let's see. The OpenCV camera coordinate frame is "X toward image right, Y toward image bottom, Z = X x Y toward the scene". Q=[R|t] is the coordinate transform from camera2 to camera1, so that t is the vector rooted at camera1, with the tip at camera2, expressed in camera1 frame. Thus your translation vector implies that camera2 is to the left of camera 1, which, given your images, is possible only if the car's side view is in camera2 and the car's front view is in camera 1. This is consistent with a positive Z component of the translation, since in the side view the car appears further away from the camera.

This identification is also consistent with the Euler angles you computed: they are returned in the OpenGL convention, thus expressing the rotation from source to destination. In your case, a rotation of 46 deg about the vertical axis of camera1, counterclockwise w.r.t. the downward-oriented Y axis, brings you just about at the side view you have.

来源：https://stackoverflow.com/questions/31447128/camera-pose-estimation-how-do-i-interpret-rotation-and-translation-matrices

标签

c++

OpenCV

computer-vision

coordinate-transformation

structure-from-motion