Recently, I have been studying the pinhole camera model, but I was confused with the model provided by OpenCV and the \"Multiple View geometry in computer vision\" tex
It is actually much simpler: The coordinates of your object are supposed to be in camera world coordinates, which is a coordinate system whose x- and y- axis are parallel to the respective axis on the image plane, e.g. here: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT9/node2.html