How do I reverse-project 2D points into 3D?

后端 未结 13 1488
半阙折子戏
半阙折子戏 2020-11-28 18:25

I have 4 2D points in screen-space, and I need to reverse-project them back into 3D space. I know that each of the 4 points is a corner of a 3D-rotated rigid rectangle, and

相关标签:
13条回答
  • 2020-11-28 18:42

    When you project from 3D to 2D you lose information.

    In the simple case of a single point the inverse projection would give you an infinite ray through 3d space.

    Stereoscopic reconstruction will typically start with two 2d images and project both back to 3D. Then look for an intersection of the two 3D rays produced.

    Projection can take different forms. Orthogonal or perspective. I'm guessing that you are assuming orthogonal projection?

    In your case assuming you had the original matrix you would have 4 rays in 3D space. You would then be able to constrain the problem by your 3d rectangle dimensions and attempt to solve.

    The solution will not be unique as a rotation around either axis that is parallel to the 2d projection plane will be ambiguous in direction. In other words if the 2d image is perpendicular to the z axis then rotating the 3d rectangle clockwise or anti clockwise around the x axis would produce the same image. Likewise for the y axis.

    In the case where the rectangle plane is parallel to the z axis you have even more solutions.

    As you don't have the original projection matrix further ambiguity is introduced by an arbitary scaling factor that exists in any projection. You cannot distinguish between a scaling in the projection and a translation in 3d in the direction of the z axis. This is not a problem if you are only interested in the relative positions of the 4 points in 3d space when related to each other and not to the plane of the 2d projection.

    In a perspective projection things get harder...

    0 讨论(0)
  • 2020-11-28 18:43

    Alright, I came here looking for an answer and didn't find something simple and straightforward, so I went ahead and did the dumb but effective (and relatively simple) thing: Monte Carlo optimisation.

    Very simply put, the algorithm is as follows: Randomly perturb your projection matrix until it projects your known 3D coordinates to your known 2D coordinates.

    Here is a still photo from Thomas the Tank Engine:

    Let's say we use GIMP to find the 2D coordinates of what we think is a square on the ground plane (whether or not it is really a square depends on your judgment of the depth):

    I get four points in the 2D image: (318, 247), (326, 312), (418, 241), and (452, 303).

    By convention, we say that these points should correspond to the 3D points: (0, 0, 0), (0, 0, 1), (1, 0, 0), and (1, 0, 1). In other words, a unit square in the y=0 plane.

    Projecting each of these 3D coordinates into 2D is done by multiplying the 4D vector [x, y, z, 1] with a 4x4 projection matrix, then dividing the x and y components by z to actually get the perspective correction. This is more or less what gluProject() does, except gluProject() also takes the current viewport into account and takes a separate modelview matrix into account (we can just assume the modelview matrix is the identity matrix). It is very handy to look at the gluProject() documentation because I actually want a solution that works for OpenGL, but beware that the documentation is missing the division by z in the formula.

    Remember, the algorithm is to start with some projection matrix and randomly perturb it until it gives the projection that we want. So what we're going to do is project each of the four 3D points and see how close we get to the 2D points we wanted. If our random perturbations cause the projected 2D points to get closer to the ones we marked above, then we keep that matrix as an improvement over our initial (or previous) guess.

    Let's define our points:

    # Known 2D coordinates of our rectangle
    i0 = Point2(318, 247)
    i1 = Point2(326, 312)
    i2 = Point2(418, 241)
    i3 = Point2(452, 303)
    
    # 3D coordinates corresponding to i0, i1, i2, i3
    r0 = Point3(0, 0, 0)
    r1 = Point3(0, 0, 1)
    r2 = Point3(1, 0, 0)
    r3 = Point3(1, 0, 1)
    

    We need to start with some matrix, identity matrix seems a natural choice:

    mat = [
        [1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1],
    ]
    

    We need to actually implement the projection (which is basically a matrix multiplication):

    def project(p, mat):
        x = mat[0][0] * p.x + mat[0][1] * p.y + mat[0][2] * p.z + mat[0][3] * 1
        y = mat[1][0] * p.x + mat[1][1] * p.y + mat[1][2] * p.z + mat[1][3] * 1
        w = mat[3][0] * p.x + mat[3][1] * p.y + mat[3][2] * p.z + mat[3][3] * 1
        return Point(720 * (x / w + 1) / 2., 576 - 576 * (y / w + 1) / 2.)
    

    This is basically what gluProject() does, 720 and 576 are the width and height of the image, respectively (i.e. the viewport), and we subtract from 576 to count for the fact that we counted y coordinates from the top while OpenGL typically counts them from the bottom. You'll notice we're not calculating z, that's because we don't really need it here (though it could be handy to ensure it falls within the range that OpenGL uses for the depth buffer).

    Now we need a function for evaluating how close we are to the correct solution. The value returned by this function is what we will use to check whether one matrix is better than another. I chose to go by sum of squared distances, i.e.:

    # The squared distance between two points a and b
    def norm2(a, b):
        dx = b.x - a.x
        dy = b.y - a.y
        return dx * dx + dy * dy
    
    def evaluate(mat): 
        c0 = project(r0, mat)
        c1 = project(r1, mat)
        c2 = project(r2, mat)
        c3 = project(r3, mat)
        return norm2(i0, c0) + norm2(i1, c1) + norm2(i2, c2) + norm2(i3, c3)
    

    To perturb the matrix, we simply pick an element to perturb by a random amount within some range:

    def perturb(amount):
        from copy import deepcopy
        from random import randrange, uniform
        mat2 = deepcopy(mat)
        mat2[randrange(4)][randrange(4)] += uniform(-amount, amount)
    

    (It's worth noting that our project() function doesn't actually use mat[2] at all, since we don't compute z, and since all our y coordinates are 0 the mat[*][1] values are irrelevant as well. We could use this fact and never try to perturb those values, which would give a small speedup, but that is left as an exercise...)

    For convenience, let's add a function that does the bulk of the approximation by calling perturb() over and over again on what is the best matrix we've found so far:

    def approximate(mat, amount, n=100000):
        est = evaluate(mat)
    
        for i in xrange(n):
            mat2 = perturb(mat, amount)
            est2 = evaluate(mat2)
            if est2 < est:
                mat = mat2
                est = est2
    
        return mat, est
    

    Now all that's left to do is to run it...:

    for i in xrange(100):
        mat = approximate(mat, 1)
        mat = approximate(mat, .1)
    

    I find this already gives a pretty accurate answer. After running for a while, the matrix I found was:

    [
        [1.0836000765696232,  0,  0.16272110011060575, -0.44811064935115597],
        [0.09339193527789781, 1, -0.7990570384334473,   0.539087345090207  ],
        [0,                   0,  1,                    0                  ],
        [0.06700844759602216, 0, -0.8333379578853196,   3.875290562060915  ],
    ]
    

    with an error of around 2.6e-5. (Notice how the elements we said were not used in the computation have not actually been changed from our initial matrix; that's because changing these entries would not change the result of the evaluation and so the change would never get carried along.)

    We can pass the matrix into OpenGL using glLoadMatrix() (but remember to transpose it first, and remember to load your modelview matrix with the identity matrix):

    def transpose(m):
        return [
            [m[0][0], m[1][0], m[2][0], m[3][0]],
            [m[0][1], m[1][1], m[2][1], m[3][1]],
            [m[0][2], m[1][2], m[2][2], m[3][2]],
            [m[0][3], m[1][3], m[2][3], m[3][3]],
        ]
    
    glLoadMatrixf(transpose(mat))
    

    Now we can for example translate along the z axis to get different positions along the tracks:

    glTranslate(0, 0, frame)
    frame = frame + 1
    
    glBegin(GL_QUADS)
    glVertex3f(0, 0, 0)
    glVertex3f(0, 0, 1)
    glVertex3f(1, 0, 1)
    glVertex3f(1, 0, 0)
    glEnd()
    

    For sure this is not very elegant from a mathematical point of view; you don't get a closed form equation that you can just plug your numbers into and get a direct (and accurate) answer. HOWEVER, it does allow you to add additional constraints without having to worry about complicating your equations; for example if we wanted to incorporate height as well, we could use that corner of the house and say (in our evaluation function) that the distance from the ground to the roof should be so-and-so, and run the algorithm again. So yes, it's a brute force of sorts, but works, and works well.

    0 讨论(0)
  • 2020-11-28 18:43

    If you know the shape is a rectangle in a plane, you can greatly further constrain the problem. You certainly cannot figure out "which" plane, so you can choose that it is lying on the plane where z=0 and one of the corners is at x=y=0, and the edges are parallel to the x/y axis.

    The points in 3d are therefore {0,0,0},{w,0,0},{w,h,0},and {0,h,0}. I'm pretty certain the absolute size will not be found, so only the ratio w/h is releavant, so this is one unknown.

    Relative to this plane the camera must be at some point cx,cy,cz in space, must be pointing in a direction nx,ny,nz (a vector of length one so one of these is redundant), and have a focal_length/image_width factor of w. These numbers turn into a 3x3 projection matrix.

    That gives a total of 7 unknowns: w/h, cx, cy, cz, nx, ny, and w.

    You have a total of 8 knowns: the 4 x+y pairs.

    So this can be solved.

    Next step is to use Matlab or Mathmatica.

    0 讨论(0)
  • 2020-11-28 18:44

    Assuming that the points are indeed part of a rectangle, I'm giving a generic idea :

    Find two points with max inter-distance: these most probably define a diagonal (exception: special cases where the rectangle is almost paralell to the YZ plane, left for the student). Call them A, C. Calculate the BAD, BCD angles. These, compared to right angles, give you orientation in 3d space. To find out about z distance, you need to correlate the projected sides to the known sides, and then, based on the 3d projection method (is it 1/z?) you're on the right track to know distances.

    0 讨论(0)
  • 2020-11-28 18:45

    Thanks to @Vegard for an excellent answer. I cleaned up the code a little bit:

    import pandas as pd
    import numpy as np
    
    class Point2:
        def __init__(self,x,y):
            self.x = x
            self.y = y
    
    class Point3:
        def __init__(self,x,y,z):
            self.x = x
            self.y = y
            self.z = z
    
    # Known 2D coordinates of our rectangle
    i0 = Point2(318, 247)
    i1 = Point2(326, 312)
    i2 = Point2(418, 241)
    i3 = Point2(452, 303)
    
    # 3D coordinates corresponding to i0, i1, i2, i3
    r0 = Point3(0, 0, 0)
    r1 = Point3(0, 0, 1)
    r2 = Point3(1, 0, 0)
    r3 = Point3(1, 0, 1)
    
    mat = [
        [1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1],
    ]
    
    def project(p, mat):
        #print mat
        x = mat[0][0] * p.x + mat[0][1] * p.y + mat[0][2] * p.z + mat[0][3] * 1
        y = mat[1][0] * p.x + mat[1][1] * p.y + mat[1][2] * p.z + mat[1][3] * 1
        w = mat[3][0] * p.x + mat[3][1] * p.y + mat[3][2] * p.z + mat[3][3] * 1
        return Point2(720 * (x / w + 1) / 2., 576 - 576 * (y / w + 1) / 2.)
    
    # The squared distance between two points a and b
    def norm2(a, b):
        dx = b.x - a.x
        dy = b.y - a.y
        return dx * dx + dy * dy
    
    def evaluate(mat): 
        c0 = project(r0, mat)
        c1 = project(r1, mat)
        c2 = project(r2, mat)
        c3 = project(r3, mat)
        return norm2(i0, c0) + norm2(i1, c1) + norm2(i2, c2) + norm2(i3, c3)    
    
    def perturb(mat, amount):
        from copy import deepcopy
        from random import randrange, uniform
        mat2 = deepcopy(mat)
        mat2[randrange(4)][randrange(4)] += uniform(-amount, amount)
        return mat2
    
    def approximate(mat, amount, n=1000):
        est = evaluate(mat)
        for i in xrange(n):
            mat2 = perturb(mat, amount)
            est2 = evaluate(mat2)
            if est2 < est:
                mat = mat2
                est = est2
    
        return mat, est
    
    for i in xrange(1000):
        mat,est = approximate(mat, 1)
        print mat
        print est
    

    The approximate call with .1 did not work for me, so I took it out. I ran it for a while too, and last I checked it was at

    [[0.7576315397559887, 0, 0.11439449272592839, -0.314856490473439], 
    [0.06440497208710227, 1, -0.5607502645413118, 0.38338196981556827], 
    [0, 0, 1, 0], 
    [0.05421620936883742, 0, -0.5673977598434641, 2.693116299312736]]
    

    with an error around 0.02.

    0 讨论(0)
  • 2020-11-28 18:46

    I'll get my linear Algebra book out when I get home if nobody answered. But @ D G, not all matrices are invertible. Singular matrices aren't invertible (when determinant = 0). This will actually happen all the time, since a projection matrix must have eigenvalues of 0 and 1, and be square (since it is idempotent, so p^2 = p).

    An easy example is, [[0 1][0 1]] since the determinant = 0, and that is a projection on the line x = y!

    0 讨论(0)
提交回复
热议问题