How do orthographic and perspective camera models in structure from motion differ from each other?

时间秒杀一切 提交于 2019-12-12 02:45:44

问题


Under the assumption that the camera model is orthographic, how do orthographic and perspective camera models in structure from motion?

Also, how do these techniques differ from each other?


回答1:


Say you have a static scene and moving camera (or equivalently, rigidly moving scene and static camera) and you want to reconstruct the scene geometry and camera motion from two or more images. The reconstruction usually based on obtaining point correspondences, that is you have some equations which ones should be solved for the points and camera motion.

The solution can be either based on nonlinear minimization or on various approximations. The camera can be approximated by orthographic or perspective projection. In the simplest SFM case the camera can be approximated by orthographic projection (or more generally by weak perspective projection), where the scene can be recovered up to scale. But translation perpendicular to image plane can never be recovered due to the properties of orthographic projection.

Newer SfM methods use perspective projection, because with orthographic projection we can’t recover all information. With full perspective projection we can recover for example the translation along optical axis. That is the geometry and full motion can be recovered up to global scale factor.




回答2:


To understand why each method is chosen we need to look at the model of the camera when we model it as orthographic and when we model it as perspective.

The orthographic camera model is a special case were we assume that the distance of the scene from the center of projection is infinite. This means that we assume there isn't any distortion caused by the distance between the object and the image. As a consequence we expect to get an identity between the object coordinate in the real world and in the image.

So for example if we have a triangle in the real world in coordinates (X1,Y1,Z1) ,(X2,Y2,Z2), (X3,Y3,Z3) we expect to see the triangle on the image (x1,y1),(x2,y2),(x3,y3) were X1=wx1 X2=wx2 .. Y1=w*y1.. and so on. where w is some scaling factor.

When this is a good assumption? Pay attention that i didn't took the Z values of each point into consideration. So this assumption is good when we look at a scene where the distance of the scene from the camera is almost constant.

Note: This is a very simplistic explanation that doesn't take into considerations a lot of other factor like the camera itself lens distortion and more.



来源:https://stackoverflow.com/questions/39521396/how-do-orthographic-and-perspective-camera-models-in-structure-from-motion-diffe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!