OpenCV>> Structure from motion, triangulation

纵饮孤独 提交于 2021-02-08 09:18:16

问题


Use-case


  1. Generate a synthetic 3D scene using random points
  2. Generate two synthetic cameras
  3. Get the 2 camera 2D projections
  4. Derive the Fundamental & Essential matrices
  5. Derive rotation & translation using the Essential matrix
  6. Triangulate the 2 2D projections to result the initial 3D scene

Implementation


  1. Random 3D points ( x, y, z ) are generated
  2. Camera intrinsic matrix is statically defined
  3. Rotation matrix is statically defined ( 25 deg rotation on the Z-axis )
  4. Identity Translation Matrix ( no translation )
  5. Two projections are synthetically generated ( K*R*T )
  6. Fundamental matrix is resolved using cv::findFundamentalMat ( F )
  7. Essential matrix E is computed using 'K.t() * F * K'
  8. Camera extrinsic is extracted using SVD resulting in 4 possible solutions ( in accordance to 'Hartley & Zisserman Multiple View Geometry chapter 9.2.6 )
  9. Triangulation is done using cv::triangulatePoints in the following manner: cv::triangulatePoints(K * matRotIdentity, K * R * T, v1, v2, points);
  10. 'points' is a 4-rows N-columns matrix with homogeneous coordinates ( x, y, z, w )
  11. 'points' is converted to un-homogeneous ( local ) coordinates by dividing 'x, y, z' with 'w'

The result


The resulting 3D points match to the original points up to a scale ( ~144 in my case ).

Questions

  1. Camera Translation is derived up to a scale ( at #8 ), having that in mind, would it be right to assume that the triangulation result is also up to a scale?
  2. Is it possible to derive scale w/o having any prior knowledge of the camera position or absolute size of the points ?

Any help would be appreciated.



EDIT:

I was trying to use the exact same projection matrices used for the 3D -> 2D projection to convert back from 2D to 3D ( using cv::tirangulatePoints ), surprisingly, this has resulted a null vector ( all 3D points have x,y,z,w == 0 ), this has ended up to be because the two cameras differed only by rotation and not by transation, and thus, the two projection lines were orthogonal ( in 3D Space ) resulting a zero length baseline ( an epipolar line rather than a plane ), and thus, minimizing the distance at x,y,z == 0, resulting the null vector.

Adding a translation between the two cameras resulted in properly recovering the original coordinates, this however, given that I was using the exact same projection matrices for the 3D to 2D transfer, AND, then, 3D back to 2D triangulation.

When doing camera pose estimation ( extracting the projection matrix from points correspondence ) the translation is derived up to a scale, and thus, the triangulation result is also up to a scale.

Question

Is it possible to derive the difference in translation ( how much the camera has moved ) in metric/pixel, ... units rather than up to a scale? what prior knowledge is needed?


回答1:


  1. Triangulated points are in the same coordinate system as the cameras which are used for triangulation...

  2. In a practical sense, No.




回答2:


Think of it like you have images showing a room. You don't know if the room is at normal size for human being or it is in a matchbox.

To know the real size of the objects you have to know one of these measures:

  1. Size of one object in the scene
  2. Distance between the cameras (in your wanted coordinate system)

For the latter point it is possible with GPS coordinates to take the least squares result if you have many images and the correct GPS coordinates. Therefore, you have to get the GPS coordinates (WGS84) in a metric system, e.g. UTM.



来源:https://stackoverflow.com/questions/23253857/opencv-structure-from-motion-triangulation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!