There has been previous questions (here, here and here) related to my question, however my question has a different aspect to it, which I have not seen in any of the previou
This is for those of you who are experiencing the same problem. I thought it might help to share what I have found out.
As notified by Bill, camera calibration is the best solution to this problem.
However I found out that, using homographies and epipolar lines both the images can be aligned. This requires atleast 8 matching features in both images. This is a difficult problem when dealing with depth images.
There have been several attempts to calibrate these images which can be found here and here both require a calibration pattern to calibrate. What I was trying to achieve was to align already captured depth and rgb images, which can be done given I calibrate parameters from the same kinect sensor which I used to record.
I have found that the best way to get around this problem is to align both the images using built in library function in OpenNI and Kinect SDK.
In general what you are trying to do from a pair of RGB and Depth images is non-trivial and ill-defined. As humans we recognise the arm in the RGB image, and are able to relate it to the area of the depth image closer to the camera. However, a computer has no prior knowledge about which parts of the RGB image it expects to correspond to which parts of the depth image.
The reason most algorithms for such alignment use camera calibration is that this process allows this ill-posed problem to become well-posed.
However, there may still be ways to find the correspondences, particularly if you have lots of image pairs from the same Kinect. You then need only search for one set of transformation parameters. I don't know of any existing algorithms to do this, but as you note in your question you may find something like doing edge detection on both images and trying to align the edge images a good place to start.
Finally, note that when objects get close to the Kinect the correspondence between RGB and depth images can become poor, even after the images have been calibrated. You can see some of this effect in your images - the 'shadow' that the hand makes in your example depth image is somewhat indicative of this.