Those are some good questions. Here are my answers for you:
Why is the image rotated before applying Hough Transform?
This I don't believe is MATLAB's "official example". I just took a quick look at the documentation page for the function. I believe you pulled this from another website that we don't have access to. In any case, in general it is not necessary for you to rotate the images prior to using the Hough Transform. The goal of the Hough Transform is to find lines in the image in any orientation. Rotating them should not affect the results. However, if I were to guess the rotation was performed as a preemptive measure because the lines in the "example image" were most likely oriented at a 33 degree angle clockwise. Performing the reverse rotation would make the lines more or less straight.
What do the entries in H
represent?
H
is what is known as an accumulator matrix. Before we get into what the purpose of H
is and how to interpret the matrix, you need to know how the Hough Transform works. With the Hough transform, we first perform an edge detection on the image. This is done using the Canny edge detector in your case. If you recall the Hough Transform, we can parameterize a line using the following relationship:
rho = x*cos(theta) + y*sin(theta)
x
and y
are points in the image and most customarily they are edge points. theta
would be the angle made from the intersection of a line drawn from the origin meeting with the line drawn through the edge point. rho
would be the perpendicular distance from the origin to this line drawn through (x, y)
at the angle theta
.
Note that the equation can yield infinity many lines located at (x, y)
so it's common to bin or discretize the total number of possible angles to a predefined amount. MATLAB by default assumes there are 180 possible angles that range from [-90, 90)
with a sampling factor of 1. Therefore [-90, -89, -88, ... , 88, 89]
. What you generally do is for each edge point, you search over a predefined number of angles, determine what the corresponding rho
is. After, we count how many times you see each rho
and theta
pair. Here's a quick example pulled from Wikipedia:
Source: Wikipedia: Hough Transform
Here we see three black dots that follow a straight line. Ideally, the Hough Transform should determine that these black dots together form a straight line. To give you a sense of the calculations, take a look at the example at 30 degrees. Consulting earlier, when we extend a line where the angle made from the origin to this line is 30 degrees through each point, we find the perpendicular distance from this line to the origin.
Now what's interesting is if you see the perpendicular distance shown at 60 degrees for each point, the distance is more or less the same at about 80 pixels. Seeing this rho
and theta
pair for each of the three points is the driving force behind the Hough Transform. Also, what's nice about the above formula is that it will implicitly find the perpendicular distance for you.
The process of the Hough Transform is very simple. Suppose we have an edge detected image I
and a set of angles theta
:
For each point (x, y) in the image:
For each angle A in the angles theta:
Substitute theta into: rho = x*cos(theta) + y*sin(theta)
Solve for rho to find the perpendicular distance
Remember this rho and theta and count up the number of times you see this by 1
So ideally, if we had edge points that follow a straight line, we should see a rho
and theta
pair where the count of how many times we see this pair is relatively high. This is the purpose of the accumulator matrix H
. The rows denote a unique rho
value and the columns denote a unique theta
value.
An example of this is shown below:
Source: Google Patents
Therefore using an example from this matrix, located at theta
between 25 - 30 with a rho
of 4 - 4.5, we have found that there are 8 edge points that would be characterized by a line given this rho, theta
range pair.
Note that the range of rho
is also infinitely many values so you need to not only restrict the range of rho
that you have, but you also have to discretize the rho
with a sampling interval. The default in MATLAB is 1. Therefore, if you calculate a rho
value it will inevitably have floating point values, so you remove the decimal precision to determine the final rho
.
For the above example the rho
resolution is 0.5, so that means that for example if you calculated a rho
value that falls between 2 to 2.5, it falls in the first column. Also note that the theta
values are binned in intervals of 5. You traditionally would compute the Hough Transform with a theta
sampling interval of 1, then you merge the bins together. However for the defaults of MATLAB, the bin size is 1. This accumulator matrix tells you how many times an edge point fits a particular rho
and theta
combination. Therefore, if we see many points that get mapped to a particular rho
and theta
value, this is a great potential for a line to be detected here and that is defined by rho = x*cos(theta) + y*sin(theta)
.
Why is H
(Hough Matrix) of size 45x180? Where does this size come from?
This is a consequence of the previous point. Take note that the largest distance we would expect from the origin to any point in the image is bounded by the diagonal of the image. This makes sense because going from the top left corner to the bottom right corner, or from the bottom left corner to the top right corner would give you the greatest distance expected in the image. In general, this is defined as D = sqrt(rows^2 + cols^2)
where rows
and cols
are the rows and columns of the image.
For the MATLAB defaults, the range of rho
is such that it spans from -round(D)
to round(D)
in steps of 1. Therefore, your rows and columns are both 16, and so D = sqrt(16^2 + 16^2) = 22.45
... and so the range of D
will span from -22
to 22
and hence this results in 45 unique rho
values. Remember that the default resolution of theta
goes from [-90, 90)
(with steps of 1) resulting in 180 unique angle values. Going with this, we have 45 rows and 180 columns in the accumulator matrix and hence H
is 45 x 180
.
Why is T
of size 1x180? Where does this size come from?
This is an array that tells you all of the angles that were being used in the Hough Transform. This should be an array going from -90
to 89
in steps of 1.
Why is R
of size 1x45? Where does this size come from?
This is an array that tells you all of the rho
values that were being used in the Hough Transform. This should be an array that spans from -22
to 22
in steps of 1.
What you should take away from this is that each value in H
determines how many times we have seen a particular pair of rho
and theta
such that for R(i) <= rho < R(i + 1)
and T(j) <= theta < T(j + 1)
, where i
spans from 1 to 44 and j
spans from 1 to 179, this determines how many times we see edge points for a particular range of rho
and theta
defined previously.
What do the entries in P
represent? Are they (x, y)
or (ϴ, ρ)
?
P
is the output of the houghpeaks
function. Basically, this determines what the possible lines are by finding where the peaks in the accumulator matrix happen. This gives you the actual physical locations in P
where there is a peak. These locations are:
29 162
29 165
28 170
21 5
29 158
Each row gives you a gateway to the rho
and theta
parameters required to generate the detected line. Specifically, the first line is characterized by rho = R(29)
and theta = T(162)
. The second line is characterized by rho = R(29)
and theta = T(165)
etc. To answer your question, the values in P
are neither (x, y)
or (ρ, ϴ)
. They represent the physical locations in P
where cross-referencing R
and T
, it would give you the parameters to characterize the line that was detected in the image.
Why is the value 5 passed into houghpeaks()
?
The extra 5
in houghpeaks
returns the total number of lines you'd like to detect ideally. We can see that P
is 5 rows, corresponding to 5 lines. If you can't find 5 lines, then MATLAB will return as many lines possible.
What is the logic behind ceil(0.3*max(H(:)))
?
The logic behind this is that if you want to determine peaks in the accumulator matrix, you have to define a minimum threshold that would tell you whether the particular rho
and theta
combination would be considered a valid line. Making this threshold too low would report a lot of false lines and making this threshold too high misses a lot of lines. What they decided to do here was find the largest bin count in the accumulator matrix, take 30% of that, take the mathematical ceiling and any values in the accumulator matrix that are larger than this amount, those would be candidate lines.
Hope this helps!