I'm trying to perform object detection with RCNN on my own dataset following the tutorial on Matlab webpage. Based on the picture below:
I'm supposed to put image paths in the first column and the bounding box of each object in the following columns. But in each of my images, there is more than one object of each kind. For example there are 20 vehicles in one image. How should I deal with that? Should I create a separate row for each instance of vehicle in an image?
The example found on the website finds the pixel neighbourhood with the largest score and draws a bounding box around that region in the image. When you have multiple objects now, that complicates things. There are two approaches that you can use to facilitate finding multiple objects.
- Find all bounding boxes with scores that surpass some global threshold.
- Find the bounding box with the largest score and find those bounding boxes that surpass a percentage of this threshold. This percentage is arbitrary but from experience and what I have seen in practice, people tend to choose between 80% to 95% of the largest score found in the image. This will of course give you false positives if you submit an image as the query with objects not trained to be detected by the classifier but you will have to implement some more post-processing logic on your end.
An alternative approach would be to choose some value k
and you would display the top k
bounding boxes associated with the k
highest scores. This of course requires that you know what the value of k
is before hand and it will always assume that you have found an object in the image like the second approach.
In addition to the above logic, the approach that you state where you need to create a separate row for each instance of vehicle in the image is correct. This means that if you have multiple candidates of an object in a single image, you would need to introduce one row per instance while keeping the image filename the same. Therefore, if you had for example 20 vehicles in one image, you would need to create 20 rows in your table where the filename is all the same and you would have a single bounding box specification for each distinct object in that image.
Once you have done this, assuming that you have already trained the R-CNN detector and you want to use it, the original code to detect objects is the following referencing the website:
% Read test image testImage = imread('stopSignTest.jpg'); % Detect stop signs [bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128) % Display the detection results [score, idx] = max(score); bbox = bboxes(idx, :); annotation = sprintf('%s: (Confidence = %f)', label(idx), score); outputImage = insertObjectAnnotation(testImage, 'rectangle', bbox, annotation); figure imshow(outputImage)
This only works for one object which has the highest score. If you wanted to do this for multiple objects, you would use the score
that is output from the detect
method and find those locations that either accommodate situation 1 or situation 2.
If you had situation 1, you would modify it to look like the following.
% Read test image testImage = imread('stopSignTest.jpg'); % Detect stop signs [bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128) % New - Find those bounding boxes that surpassed a threshold T = 0.7; % Define threshold here idx = score >= T; % Retrieve those scores that surpassed the threshold s = score(idx); % Do the same for the labels as well lbl = label(idx); bbox = bboxes(idx, :); % This logic doesn't change % New - Loop through each box and print out its confidence on the image outputImage = testImage; % Make a copy of the test image to write to for ii = 1 : size(bbox, 1) annotation = sprintf('%s: (Confidence = %f)', lbl(ii), s(ii)); % Change outputImage = insertObjectAnnotation(outputImage, 'rectangle', bbox(ii,:), annotation); % New - Choose the right box end figure imshow(outputImage)
Note that I've stored the original bounding boxes, labels and scores in their original variables while the subset of the ones that surpassed the threshold in separate variables in case you want to cross-reference between the two. If you wanted to accommodate for situation 2, the code remains the same as situation 1 with the exception of defining the threshold.
The code from:
% New - Find those bounding boxes that surpassed a threshold T = 0.7; % Define threshold here idx = scores >= T; % [score, idx] = max(score);
... would now change to:
% New - Find those bounding boxes that surpassed a threshold perc = 0.85; % 85% of the maximum threshold T = perc * max(score); % Define threshold here idx = score >= T;
The end result will be multiple bounding boxes of the detected objects in the image - one annotation per detected object.
I think you actually have to put all of the coordinates for that image as a single entry in your training data table. See this MATLAB tutorial for details. If you load the training data to your MATLAB locally and check the vehicleDataset
variable, you will actually see this (sorry my score is not high enough to include images directly in my answers).
To summarize, in your training data table, make sure you have one unique entry for each image, and put however many bounding boxes into the corresponding category as a matrix, where each row is in the format of [x, y, width, height]
.