I have already done the comparison of 2 images of same scene which are taken by one camera with different view angles(say left and right) using SURF in emgu
Homography only works for planar scenes (ie: all of your points are coplanar). If that is the case then the homography is a projective transformation and it can be decomposed into its components.
But if your scene isn't coplanar (which I think is the case from your description) then it's going to take a bit more work. Instead of a homography you need to calculate the fundamental matrix (which emgucv will do for you). The fundamental matrix is a combination of the camera intrinsic matrix (K), the relative rotation (R) and translation (t) between the two views. Recovering the rotation and translation is pretty straight forward if you know K. It looks like emgucv has methods for camera calibration. I am not familiar with their particular method but these generally involve taking several images of a scene with know geometry.
It's been a while since you asked this question. By now, there are some good references on this problem.
One of them is "invitation to 3D image" by Ma, chapter 5 of it is free here http://vision.ucla.edu//MASKS/chapters.html
Also, Vision Toolbox of Peter Corke includes the tools to perform this. However, he does not explain much math of the decomposition
To figure out camera motion (exact rotation and translation up to a scaling factor) you need
Create a special 3x3 matrix
0 -1 0
W = 1 0 0
0 0 1
that helps to run decomposition:
R = UW-1VT, Tx = ULWUT, where
0 -tx ty
Tx = tz 0 -tx
-ty tx 0