What does the coordinate output of yolo algorithm represent?

戏子无情 提交于 2021-02-08 05:16:57

问题


My question is similar to this topic. I was watching this lecture on bounding box prediction by Andrew Ng when I started thinking about output of yolo algorithm. Let's consider this example, We use 19x19 grids and only one receptive field with 2 classes, so our output will be => 19x19x1x5. The last dimension(array of size 5) represents the following:

1) The class (0 or 1)  
2) X-coordinate  
3) Y-coordinate  
4) height of the bounding box  
5) Width of the bounding box

I don't understand whether X,Y coordinates represent the bounding box with respect to the size of entire image or just and receptive field(filter). In the video the bounding box is represented as a part of receptive field but logically receptive field is much smaller than bounding box and also people might tinker with filter size, so positioning bounding boxes with respect to filter makes no sense.

So, basically what does the coordinates of bounding boxes of an image represent ?


回答1:


From Understanding YOLO post @ Hacker Noon:

Each grid cell predicts B bounding boxes as well as C class probabilities. The bounding box prediction has 5 components: (x, y, w, h, confidence). The (x, y) coordinates represent the center of the box, relative to the grid cell location (remember that, if the center of the box does not fall inside the grid cell, than this cell is not responsible for it). These coordinates are normalized to fall between 0 and 1. The (w, h) box dimensions are also normalized to [0, 1], relative to the image size. Let’s look at an example:



来源:https://stackoverflow.com/questions/52455429/what-does-the-coordinate-output-of-yolo-algorithm-represent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!