I have 4 2D points in screen-space, and I need to reverse-project them back into 3D space. I know that each of the 4 points is a corner of a 3D-rotated rigid rectangle, and
For my OpenGL engine, the following snip will convert mouse/screen coordinates into 3D world coordinates. Read the commments for an actual description of what is going on.
/* FUNCTION: YCamera :: CalculateWorldCoordinates ARGUMENTS: x mouse x coordinate y mouse y coordinate vec where to store coordinates RETURN: n/a DESCRIPTION: Convert mouse coordinates into world coordinates */
void YCamera :: CalculateWorldCoordinates(float x, float y, YVector3 *vec)
{
// START
GLint viewport[4];
GLdouble mvmatrix[16], projmatrix[16];
GLint real_y;
GLdouble mx, my, mz;
glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, mvmatrix);
glGetDoublev(GL_PROJECTION_MATRIX, projmatrix);
real_y = viewport[3] - (GLint) y - 1; // viewport[3] is height of window in pixels
gluUnProject((GLdouble) x, (GLdouble) real_y, 1.0, mvmatrix, projmatrix, viewport, &mx, &my, &mz);
/* 'mouse' is the point where mouse projection reaches FAR_PLANE.
World coordinates is intersection of line(camera->mouse) with plane(z=0) (see LaMothe 306)
Equation of line in 3D:
(x-x0)/a = (y-y0)/b = (z-z0)/c
Intersection of line with plane:
z = 0
x-x0 = a(z-z0)/c <=> x = x0+a(0-z0)/c <=> x = x0 -a*z0/c
y = y0 - b*z0/c
*/
double lx = fPosition.x - mx;
double ly = fPosition.y - my;
double lz = fPosition.z - mz;
double sum = lx*lx + ly*ly + lz*lz;
double normal = sqrt(sum);
double z0_c = fPosition.z / (lz/normal);
vec->x = (float) (fPosition.x - (lx/normal)*z0_c);
vec->y = (float) (fPosition.y - (ly/normal)*z0_c);
vec->z = 0.0f;
}
Yes, Monte Carlo works, but I found better solution for this issue. This code works perfectly (and uses OpenCV):
Cv2.CalibrateCamera(new List<List<Point3f>>() { points3d }, new List<List<Point2f>>() { points2d }, new Size(height, width), cameraMatrix, distCoefs, out rvecs, out tvecs, CalibrationFlags.ZeroTangentDist | CalibrationFlags.FixK1 | CalibrationFlags.FixK2 | CalibrationFlags.FixK3);
This function takes known 3d and 2d points, size of screen and returns rotation (rvecs[0]), translation (tvecs[0]) and matrix of intrinsics values of camera. It's everything you need.
The projection you have onto the 2D surface has infinitely many 3D rectangles that will project to the same 2D shape.
Think about it this way: you have four 3D points that make up the 3D rectangle. Call them (x0,y0,z0), (x1,y1,z1), (x2,y2,z2) and (x3,y3,z3). When you project these points onto the x-y plane, you drop the z coordinates: (x0,y0), (x1,y1), (x2,y2), (x3,y3).
Now, you want to project back into 3D space, you need to reverse-engineer what z0,..,z3 were. But any set of z coordinates that a) keep the same x-y distance between the points, and b) keep the shape a rectangle will work. So, any member of this (infinite) set will do: {(z0+i, z1+i, z2+i, z3+i) | i <- R}.
Edit @Jarrett: Imagine you solved this and ended up with a rectangle in 3D space. Now, imagine sliding that rectangle up and down the z-axis. Those infinite amount of translated rectangles all have the same x-y projection. How do you know you found the "right" one?
Edit #2: Alright, this is from a comment I made on this question -- a more intuitive approach to reasoning about this.
Imagine holding a piece of paper above your desk. Pretend each corner of the paper has a weightless laser pointer attached to it that points down toward the desk. The paper is the 3D object, and the laser pointer dots on the desk are the 2D projection.
Now, how can you tell how high off the desk the paper is by looking at just the laser pointer dots?
You can't. Move the paper straight up and down. The laser pointers will still shine on the same spots on the desk regardless of the height of the paper.
Finding the z-coordinates in the reverse-projection is like trying to find the height of the paper based on the laser pointer dots on the desk alone.
This is the Classic problem for marker based Augmented Reality.
You have a square marker (2D Barcode), and you want to find its Pose (translation & rotation in relation to the camera), after finding the four edges of the marker. Overview-Picture
I'm not aware of the latest contributions to the field, but at least up to a point (2009) RPP was supposed to outperform POSIT that is mentioned above (and is indeed a classic approach for this) Please see the links, they also provide source.
http://www.emt.tugraz.at/~vmg/schweighofer
http://www.emt.tugraz.at/publications/EMT_TR/TR-EMT-2005-01.pdf
http://www.emt.tugraz.at/system/files/rpp_MATLAB_ref_implementation.tar.gz
(PS - I know it's a bit old topic, but anyway, the post might be helpful to somebody)
From the 2-D space there will be 2 valid rectangles that can be built. Without knowing the original matrix projection, you won't know which one is correct. It's the same as the "box" problem: you see two squares, one inside the other, with the 4 inside vertices connected to the 4 respective outside vertices. Are you looking at a box from the top-down or the bottom-up?
That being said, you are looking for a matrix transform T where...
{{x1, y1, z1}, {x2, y2, z2}, {x3, y3, z3}, {x4, y4, z4}} x T = {{x1, y1}, {x2, y2}, {x3, y3}, {x4, y4}}
(4 x 3) x T = (4 x 2)
So T must be a (3 x 2) matrix. So we've got 6 unknowns.
Now build a system of constraints on T and solve with Simplex. To build the constraints, you know that a line passing through the first two points must be parallel to the line passing to the second two points. You know a line passing through points 1 and 3 must be parallel to the lines passing through points 2 and 4. You know a line passing through 1 and 2 must be orthogonal to a line passing through points 2 and 3. You know that the length of the line from 1 and 2 must equal the length of the line from 3 and 4. You know that the length of the line from 1 and 3 must equal the length of the line from 2 and 4.
To make this even easier, you know about the rectangle, so you know the length of all the sides.
That should give you plenty of constraints to solve this problem.
Of course, to get back, you can find T-inverse.
@Rob: Yes, there are an infinite number of projections, but not an infinite number of projects where the points must satisfy the requirements of a rectangle.
@nlucaroni: Yes, this is only solvable if you have four points in the projection. If the rectangle projects to just 2 points (i.e. the plane of the rectangle is orthogonal to the projection surface), then this cannot be solved.
Hmmm... I should go home and write this little gem. This sounds like fun.
Updates:
D. DeMenthon devised an algorithm to compute the pose of an object (its position and orientation in space) from feature points in a 2D image when knowing the model of the object -- this is your exact problem:
We describe a method for finding the pose of an object from a single image. We assume that we can detect and match in the image four or more noncoplanar feature points of the object, and that we know their relative geometry on the object.
The algorithm is known as Posit and is described in it classical article "Model-Based Object Pose in 25 Lines of Code" (available on its website, section 4).
Direct link to the article: http://www.cfar.umd.edu/~daniel/daniel_papersfordownload/Pose25Lines.pdf OpenCV implementation: http://opencv.willowgarage.com/wiki/Posit
The idea is to repeatedly approximating the perspective projection by a scaled orthographic projection until converging to an accurate pose.