Understanding weird YOLO convolutional layer output size

和自甴很熟 提交于 2021-01-05 09:15:47

问题


I am trying to understand how Darknet works, and I was looking at the yolov3-tiny configuration file, specifically the layer number 13 (line 107).

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

The size of the kernel is 1x1, the stride is 1 and the padding is 1 too. When I load the network using darknet, it indicates that the output width and height are the same as the input:

13 conv    256       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 256

However, shouldn't the width and height increase by 2 since the kernel is 1x1 and there is padding? If I understand it correctly, the kernel is going to run through all the "pixels" of the input plus the padding, so it makes sense for me that the width and height should increase by 2*padding.

I used the formula

output_size = ((input_size – kernel_size + 2*padding) / stride) + 1

and it checks out. (13 - 1 + 2 * 1) / 1 + 1 = 15

Does anybody know what I'm missing?

Thank you in advance.


回答1:


I figured it out.

I misunderstood the pad parameter in the layer. If you want the padding to be 1, you should write:

padding=1

pad is actually a boolean. When set to one, the padding of the layer will be equal to size / 2.

In this case, the size of the kernel was 1, and so the padding ends up being 1/2 = 0 (integer operation). Since there is no padding, the output width and height remains the same as in the input.

I should've RTFM.



来源:https://stackoverflow.com/questions/62483524/understanding-weird-yolo-convolutional-layer-output-size

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!