How to calculate Rotation and Translation matrices from homography?

问题

I have already done the comparison of 2 images of same scene which are taken by one camera with different view angles(say left and right) using SURF in emgucv (C#). And it gave me a 3x3 homography matrix for 2D transformation. But now I want to make those 2 images in 3D environment (using DirectX). To do that I need to calculate relative location and orientation of 2nd image(right) to the 1st image(left) in 3D form. How can I calculate Rotation and Translate matrices for 2nd image?

I need also z value for 2nd image.

I read something called 'Homograhy decomposition'. Is it the way?

Is there anybody who familiar with homography decomposition and is there any algorithm which it implement?

Thanks in advance for any help.

回答1:

Homography only works for planar scenes (ie: all of your points are coplanar). If that is the case then the homography is a projective transformation and it can be decomposed into its components.

But if your scene isn't coplanar (which I think is the case from your description) then it's going to take a bit more work. Instead of a homography you need to calculate the fundamental matrix (which emgucv will do for you). The fundamental matrix is a combination of the camera intrinsic matrix (K), the relative rotation (R) and translation (t) between the two views. Recovering the rotation and translation is pretty straight forward if you know K. It looks like emgucv has methods for camera calibration. I am not familiar with their particular method but these generally involve taking several images of a scene with know geometry.

回答2:

To figure out camera motion (exact rotation and translation up to a scaling factor) you need

Calculate fundamental matrix F, for example, using eight-point algorithm
Calculate Essential matrix E = A’FA, where A is intrinsic camera matrix
Decompose E which is by definition Tx * R via SVD into E=ULV’

Create a special 3x3 matrix

    0 -1  0   
W = 1  0  0      
    0  0  1

that helps to run decomposition:

R = UW^-1V^T, Tx = ULWU^T, where

      0  -tx  ty
Tx =  tz  0   -tx
     -ty  tx   0

Since E can have an arbitrary sign and W can be replace by Winv we have 4 distinct solution and have to select the one which produces most points in front of the camera.

回答3:

It's been a while since you asked this question. By now, there are some good references on this problem.

One of them is "invitation to 3D image" by Ma, chapter 5 of it is free here http://vision.ucla.edu//MASKS/chapters.html

Also, Vision Toolbox of Peter Corke includes the tools to perform this. However, he does not explain much math of the decomposition

来源：https://stackoverflow.com/questions/9271150/how-to-calculate-rotation-and-translation-matrices-from-homography

标签

computer-vision

emgucv

homography