How to get the multiple bounding box coordinates in tensorflow object-detection API

半世苍凉 提交于 2019-12-11 14:52:02

问题


I want to get the multiple bounding boxes co-ordinates and the class of each bounding box and return it as a JSON file.

when I print boxes[] from the following code, It has a shape of (1,300,4). There are 300 coordinates in boxes[]. But there are only 2 on my predicted image. I want the coordinates of the bounding boxes which are predicted on my image.

Also, how would we know which bounding box is mapped to which category/class in the image?

for example, let's say I have a dog and a person in an image, how would I know which bounding box corresponds to the dog class and which one to the person class? The boxes[] give us an array of shape (1,300,4) without any indication of which bounding box corresponds to which class in the image.

I followed this answer to get bounding box coordinates from the 300 coordinates in the boxes[] using a threshold score.

I've tried getting the bounding box with the highest score. But it only returns a single bounding box even if the predicted image has multiple bounding boxes.

The bounding box coordinates with the highest score doesn't even match the bounding box coordinates on the predicted Image. How do I get bounding box coordinates which are on my predicted image?

            vis_util.visualize_boxes_and_labels_on_image_array(
                image_np,
                np.squeeze(boxes),
                np.squeeze(classes).astype(np.int32),
                np.squeeze(scores),
                category_index,
                use_normalized_coordinates=True,
                line_thickness=8)
            im = Image.fromarray(image_np)

            true_boxes = boxes[0][scores[0]==scores.max()]    # Gives us the box with max score
            for i in range(true_boxes.shape[0]):   # rescaling the coordinates
                ymin = true_boxes[i,0]*height
                xmin = true_boxes[i,1]*width
                ymax = true_boxes[i,2]*height
                xmax = true_boxes[i,3]*width

The coordinates I get from the above code xmin,ymin,xmax,ymax (which has the max score) doesn't exactly match the bounding box coordinates on the predicted image. They are off by a few pixels. Also, I only get one bounding box even though the predicted image has multiple bounding boxes and multiple classes (ex: A dog and a person).

I would like to return a JSON file with the image_name,bounding_boxes, and class corresponding to each bounding box.

Thanks, I'm new to this. Please ask if you didn't understand any part of the question.


回答1:


I followed this answer here link and I found all of my bounding box coordinates:

min_score_thresh=0.60
true_boxes = boxes[0][scores[0] > min_score_thresh]
for i in range(true_boxes.shape[0]):
    ymin = int(true_boxes[i,0]*height)
    xmin = int(true_boxes[i,1]*width)
    ymax = int(true_boxes[i,2]*height)
    xmax = int(true_boxes[i,3]*width)

    roi = image[ymin:ymax,xmin:xmax].copy()
    cv2.imwrite("box_{}.jpg".format(str(i)), roi)


来源:https://stackoverflow.com/questions/56291742/how-to-get-the-multiple-bounding-box-coordinates-in-tensorflow-object-detection

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!