Understanding weird YOLO convolutional layer output size

问题

I am trying to understand how Darknet works, and I was looking at the yolov3-tiny configuration file, specifically the layer number 13 (line 107).

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

The size of the kernel is 1x1, the stride is 1 and the padding is 1 too. When I load the network using darknet, it indicates that the output width and height are the same as the input:

13 conv    256       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 256

However, shouldn't the width and height increase by 2 since the kernel is 1x1 and there is padding? If I understand it correctly, the kernel is going to run through all the "pixels" of the input plus the padding, so it makes sense for me that the width and height should increase by 2*padding.

I used the formula

output_size = ((input_size – kernel_size + 2*padding) / stride) + 1

and it checks out. (13 - 1 + 2 * 1) / 1 + 1 = 15

Does anybody know what I'm missing?

Thank you in advance.

回答1:

I figured it out.

I misunderstood the pad parameter in the layer. If you want the padding to be 1, you should write:

padding=1

pad is actually a boolean. When set to one, the padding of the layer will be equal to size / 2.

In this case, the size of the kernel was 1, and so the padding ends up being 1/2 = 0 (integer operation). Since there is no padding, the output width and height remains the same as in the input.

I should've RTFM.

来源：https://stackoverflow.com/questions/62483524/understanding-weird-yolo-convolutional-layer-output-size

标签

neural-network

conv-neural-network

yolo

darknet