I need some help on LBP based face detection and that is why I am writing this.
I have the following questions related to face detection implemented on OpenCV:
I refer you to my own answer from the past which lightly touches on the topic, but didn't explain the XML cascade format.
Let's look at a fake, modified for clarity example of a cascade with only a single stage, and three features.
<!-- stage 0 -->
<_>
<maxWeakCount>3</maxWeakCount>
<stageThreshold>-0.75</stageThreshold>
<weakClassifiers>
<!-- tree 0 -->
<_>
<internalNodes>
0 -1 3 -67130709 -21569 -1426120013 -1275125205 -21585
-16385 587145899 -24005</internalNodes>
<leafValues>
-0.65 0.88</leafValues></_>
<!-- tree 1 -->
<_>
<internalNodes>
0 -1 0 -163512766 -769593758 -10027009 -262145 -514457854
-193593353 -524289 -1</internalNodes>
<leafValues>
-0.77 0.72</leafValues></_>
<!-- tree 2 -->
<_>
<internalNodes>
0 -1 2 -363936790 -893203669 -1337948010 -136907894
1088782736 -134217726 -741544961 -1590337</internalNodes>
<leafValues>
-0.71 0.68</leafValues></_></weakClassifiers></_>
Somewhat later....
<features>
<_>
<rect>
0 0 3 5</rect></_>
<_>
<rect>
0 0 4 2</rect></_>
<_>
<rect>
0 0 6 3</rect></_>
<_>
<rect>
0 1 4 3</rect></_>
<_>
<rect>
0 1 3 3</rect></_>
...
Let us look first at the tags of a stage:
maxWeakCount
for a stage is the number of weak classifiers in the stage, what is called in the comments a <!-- tree -->
and what I call an LBP feature.
3
.stageThreshold
is what the weights of the features must add up to at least for the stage to pass.
-0.75
.Turning to the tags describing an LBP feature:
internalNodes
are an array of 11 integers. The first two are meaningless for LBP cascades. The third is the index into the <features>
table of <rect>
s at the end of the XML file (A <rect>
describes the geometry of the feature). The last 8 values are eight 32-bit values which together constitute the 256-bit LUT I spoke of in my earlier answer. This LUT is computed by the training process, which I don't fully understand myself.
3
, which is described by the four integers 0 1 4 3
.leafValues
are the two weights (pass/fail) associated with a feature. Depending on the bit selected from the internalNodes
during feature evaluation, one of those two weights is added to a total. This total is compared to the stage's <stageThreshold>
. Then, bool stagePassed = (sum >= stageThreshold - EPS);
, where EPS
is 1e-5, determines whether the stage has passed or failed. The weights are also determined by the training process.
-0.65
and the pass weight is 0.88
.Lastly, the <feature>
tag. It consists of an array of <rect>
tags which contain 4 integers describing the geometry of the feature. Given a processing window (24x24 in your case), the first two integers describe its x
and y
integer pixel offset within the processing window, and the next two integers describe the width and height of one subrectangle out of the 9 that are needed for the LBP feature to be evaluated.
In essence then, a tag <rect> ft.x ft.y ft.width ft.height </rect>
situated within a processing window pW.width
xpW.height
checking whether a face is present at pW.x
xpW.y
corresponds to...
To evaluate the LBP then, it suffices to read the integral image at points p[0..15]
and use p[BR]+p[TL]-p[TR]-p[BL]
to compute the integral of the nine subrectangles. The central subrectangle, R4, is compared that of the eight others, clockwise starting from R0, to produce an 8-bit LBP (the bits are packed [msb 01258763 lsb]).
This 8-bit LBP is then used as an index into the feature's (2^8 = 256)-bit LUT (the <internalNodes>
), selecting a single bit. If this bit is 1, the feature is inconsistent with a face; if 0, it is consistent with a face. The appropriate weight (<leafNode>
) is then returned and added with the weights of all other features to produce an overall stage sum. This is then compared to <stageThreshold>
to determine whether the stage passed or failed.
If there's something else I didn't explain well enough I can clarify.