How to calculate Rotation and Translation matrices from homography?

后端 未结 3 1951
星月不相逢
星月不相逢 2020-12-30 17:35

I have already done the comparison of 2 images of same scene which are taken by one camera with different view angles(say left and right) using SURF in emgu

相关标签:
3条回答
  • 2020-12-30 17:51

    Homography only works for planar scenes (ie: all of your points are coplanar). If that is the case then the homography is a projective transformation and it can be decomposed into its components.

    But if your scene isn't coplanar (which I think is the case from your description) then it's going to take a bit more work. Instead of a homography you need to calculate the fundamental matrix (which emgucv will do for you). The fundamental matrix is a combination of the camera intrinsic matrix (K), the relative rotation (R) and translation (t) between the two views. Recovering the rotation and translation is pretty straight forward if you know K. It looks like emgucv has methods for camera calibration. I am not familiar with their particular method but these generally involve taking several images of a scene with know geometry.

    0 讨论(0)
  • It's been a while since you asked this question. By now, there are some good references on this problem.

    One of them is "invitation to 3D image" by Ma, chapter 5 of it is free here http://vision.ucla.edu//MASKS/chapters.html

    Also, Vision Toolbox of Peter Corke includes the tools to perform this. However, he does not explain much math of the decomposition

    0 讨论(0)
  • 2020-12-30 17:55

    To figure out camera motion (exact rotation and translation up to a scaling factor) you need

    • Calculate fundamental matrix F, for example, using eight-point algorithm
    • Calculate Essential matrix E = A’FA, where A is intrinsic camera matrix
    • Decompose E which is by definition Tx * R via SVD into E=ULV’
    • Create a special 3x3 matrix

          0 -1  0   
      W = 1  0  0      
          0  0  1  
      

    that helps to run decomposition:

    R = UW-1VT, Tx = ULWUT, where

          0  -tx  ty
    Tx =  tz  0   -tx
         -ty  tx   0 
    
    • Since E can have an arbitrary sign and W can be replace by Winv we have 4 distinct solution and have to select the one which produces most points in front of the camera.
    0 讨论(0)
提交回复
热议问题