问题
I'm trying to write a function that when given two cameras, their rotation, translation matrices, focal point, and the coordinates of a point for each camera, will be able to triangulate the point into 3D space. Basically, given all the extrinsic/intrinsic values needed
I'm familiar with the general idea: to somehow create two rays and find the closest point that satisfies the least squares problem, however, I don't know exactly how to translate the given information to a series of equations to the coordinate point in 3D.
回答1:
Assume you have two cameras -- camera 1 and camera 2.
For each camera j = 1, 2 you are given:
The distance
hj
between it's centerOj
, (is "focal point" the right term? Basically the pointOj
from which the camera is looking at its screen) and the camera's screen. The camera's coordinate system is centered atOj
, theOj--->x
andOj--->y
axes are parallel to the screen, while theOj--->z
axis is perpendicular to the screen.The 3 x 3 rotation matrix
Uj
and the 3 x 1 translation vectorTj
which transforms the Cartesian 3D coordinates with respect to the system of camera j (see point 1) to the world-coordinates, i.e. the coordinates with respect to a third coordinate system from which all points in the 3D world are described.On the screen of camera j, which is the plane parallel to the plane
Oj-x-y
and at a distancehj
from the originOj
, you have the 2D coordinates (let's say the x,y coordinates only) of pointpj
, where the two pointsp1
andp2
are in fact the projected images of the same pointP
, somewhere in 3D, onto the screens of camera 1 and 2 respectively. The projection is obtained by drawing the 3D line between pointOj
and pointP
and defining pointpj
as the unique intersection point of this line with with the screen of camera j. The equation of the screen in camera j's 3D coordinate system isz = hj
, so the coordinates of pointpj
with respect to the 3D coordinate system of camera j look likepj = (xj, yj, hj)
and so the 2D screen coordinates are simplypj = (xj, yj)
.
Input: You are given the 2D points p1 = (x1, y1), p2 = (x2, y2)
, the twp cameras' focal distances h1, h2
, two 3 x 3 rotation matrices U1
and U2
, two translation 3 x 1 vector columns T1
and T2
.
Output: The coordinates P = (x0, y0, z0)
of point P in the world coordinate system.
One somewhat simple way to do this, avoiding homogeneous coordinates and projection matrices (which is fine too and more or less equivalent), is the following algorithm:
Form
Q1 = [x1; y1; h1]
andQ2 = [x2; y2; h2]
, where they are interpreted as 3 x 1 vector columns;Transform
P1 = U1*Q1 + T1
andP2 = U1*Q2 + T1
, where*
is matrix multiplication, here it is a 3 x 3 matrix multiplied by a 3 x 1 column, givin a 3 x 1 column;Form the lines
X = T1 + t1*(P1 - T1)
andX = T2 + t2*(P2 - T2)
;The two lines from the preceding step 3 either intersect at a common point, which is the point
P
or they are skew lines, i.e. they do not intersect but are not parallel (not coplanar).If the lines are skew lines, find the unique point
X1
on the first line and the uniqe pointX2
on the second line such that the vectorX2 - X1
is perpendicular to both lines, i.e.X2 - X1
is perpendicular to both vectorsP1 - T1
andP2 - T2
. These two point X1 and X2 are the closest points on the two lines. Then pointP = (X1 + X2)/2
can be taken as the midpoint of the segmentX1 X2
.
In general, the two lines should pass very close to each other, so the two points X1 and X2 should be very close to each other.
来源:https://stackoverflow.com/questions/55740284/how-to-triangulate-a-point-in-3d-space-given-coordinate-points-in-2-image-and-e