I have the following image I1. I did not capture it. I downloaded it from Google
I apply a known homography h to I1 to obtain the following image I2.
The difficulty you are having is that your homography matrix h does not correspond well with a projection obtained with a sensible perspective camera. I think there is a simpler approach.
Fundamentally, you needed to be very clear about your technical goal, and separate this from your approach for solving it. Always do this whenever you tackle any vision problem.
So let's be clear about the technical goal. You have a top-down image of a planar surface (also called a rectified view). Normally you would call this surface the model, defined on the plane z=0. You want to render this model. Specifically you want to do the following;
For simplicity I'm going to use T(R,t) to denote the 4x4 homogeneous rigid transform for some rotation R and translation t. The model-to-camera transform at stage 3 is therefore given by T=T(R2, (0,0,0)) x T(R1, t1).
There are two good ways to create I2
Use a rendering engine such as OpenGL or Ogre. The advantage of this is that it can be easy to make a GUI for changing the camera viewpoint and other complex rendering effects can be added.
Determine the model-to-image homography matrix and render with OpenCV using warpPerspective
. The advantage of this is that it can be done in a few lines without breaking into rendering software. The disadvantage is that you can get some strange effects if the homography has a vanishing point in the render (as you are observing). More on that point later.
To use the OpenCV approach we define the model-to-image homography as H2. This can be defined in terms of the camera parameters. Consider a point p=(x,y,1) on the model plane in homogeneous coordinates. Its position q in I2 in homogeneous coordinates is given by q=K M p, where M is. 3x3 matrix given by M=(T00,T01,T03; T10,T11,T13; T20,T21,T23). This is straightforward to derive using the perspective camera model. Consequently, we now have that H2 =K M.
Now we have to instantiate the homography, unlike your proposed approach, I would define it using a particular camera configuration, by specifying K, R1, t1, R2. The choice is up to you! To simplify the definition of K you can use a simple form with one free parameter (focal length), and set the principal point to the image centre. For typical cameras f ranges between 0.5 and 2 time the image width, but it's up to you. You then need to set R1 and t1 depending on the viewing angle/distance that you want for your viewpoint.
I want to emphasize that this does not contradict any of the previous answers I have given. It is simply a different approach which may be easier to manage. Essentially, here I am proposing to define your homography directly using camera parameters (which you set as you want). This guarantees you are using a sensible intrinsic matrix (because you set it yourself). It is different to your approach where you first create a homography and then want to find the matching camera parameters (which may or not be physically sensible).