I have a depth image, that I\'ve generated using 3D CAD data. This depth image can also be taken from a depth imaging sensor such as Microsoft Kinect or any other stereo cam
I will describe what I think you have to do conceptually and provide links to the relevant parts of opencv,
To determine normal of a given (3d) point in a pointcloud:
Create a kd-tree or (balltree?) representation of your point cloud so that you can efficiently compute the k nearest neighbors. Your choice of k should depend on the density of your data. http://docs.opencv.org/trunk/modules/flann/doc/flann_fast_approximate_nearest_neighbor_search http://physics.nyu.edu/grierlab/manuals/opencv/classcv_1_1KDTree.html
After querying for the k-nearest neighbors of a given point p, use them to find a best fit plane. You can use PCA to do this. Set maxComponents=2. http://physics.nyu.edu/grierlab/manuals/opencv/classcv_1_1PCA.html https://github.com/Itseez/opencv/blob/master/samples/cpp/pca.cpp
Step 2 should return two eigenvectors which define the plane you are interested in. The cross product of these two vectors should be (an estimation of) your desired normal vector. You can find info how to calculate this in opencv (Mat::cross) http://docs.opencv.org/modules/core/doc/basic_structures.html
So there are a couple of things that are undefined in your question, but I'll do my best to outline an answer.
The basic idea for what you want to do is to take the gradient of the image, and then apply a transformation to the gradient to get the normal vectors. Taking the gradient in matlab is easy:
[m, g] = imgradient(d);
gives us the magnitude (m
) and the direction (g
) of the gradient (relative to the horizontal and measured in degrees) of the image at every point. For instance, if we display the magnitude of the gradient for your image it looks like this:
Now, the harder part is to take this information we have about the gradient and turn it into a normal vector. In order to do this properly we need to know how to transform from image coordinates to world coordinates. For a CAD-generated image like yours, this information is contained in the projection transformation used to make the image. For a real-world image like one you'd get from a Kinect, you would have to look up the spec for the image-capture device.
The key piece of information we need is this: just how wide is each pixel in real-world coordinates? For non-orthonormal projections (like those used by real-world image capture devices) we can approximate this by assuming each pixel represents light within a fixed angle of the real world. If we know this angle (call it p
and measure it in radians), then the real-world distance covered by a pixel is just sin(p) .* d
, or approximately p .* d
where d
is the depth of the image at each pixel.
Now if we have this info, we can construct the 3 components of the normal vectors:
width = p .* d;
gradx = m .* cos(g) * width;
grady = m .* sin(g) * width;
normx = - gradx;
normy = - grady;
normz = 1;
len = sqrt(normx .^ 2 + normy .^ 2 + normz .^ 2);
x = normx ./ len;
y = normy ./ len;
z = normz ./ len;
What mattnewport is suggesting is can be done in a pixel shader. In each pixel shader you calculate two vectors A and B and the cross product of the vectors will give you the normal. The way you calculate the two vectors is like so:
float2 du //values sent to the shader based on depth image's width and height
float2 dv //normally du = float2(1/width, 0) and dv = float2(0, 1/height)
float D = sample(depthtex, uv)
float D1 = sample(depthtex, uv + du)
float D2 = sample(depthtex, uv + dv)
float3 A = float3(du*width_of_image, 0, D1-D)
float3 B = float3(0, dv*height_of_image, D2-D)
float3 normal = AXB
return normal
This will break when there're discontinuities in the depth values.
To calculate if a surface is flat in the pixel shader the second order partial derivatives can be used. The way you calculate the second order derivatives is by calculating the finite differences and the finding the difference on that like so:
float D = sample(depthtex, uv)
float D1 = sample(depthtex, uv + du)
float D3 = sample(depthtex, uv - du)
float dx1 = (D1 - D)/du
float dx2 = (D - D3)/du
float dxx = (dx2 - dx1)/du
In the same way you have to calculate dyy, dxy and dyx
. The surface is flat if dxx = dyy = dxy = dyx = 0.
Typically, you'd choose the du and dv to be 1/width and 1/height of the depth image .
All of this stuff happens on the GPU which makes everything really fast. But if you don't care about that you can run this method in the CPU as well. The only issue will be for you to replace a function like sample
and implement your own version of that. It will take the depth image and u, v values as input and return a depth value at the sampled point.
Edit:
Here's a hypothetical sampling function that does nearest neighbour sampling on the CPU.
float Sample(const Texture& texture, vector_2d uv){
return texture.data[(int)(uv.x * texture.width + 0.5)][(int)(uv.y * texture.height + 0.5];
}