Understanding the dimensions of a fully-connected layer that follows a max-pooling layer [closed]

后端未结

关注

 6  1484

忘掉有多难

相关标签:

6条回答

半阙折子戏

2021-02-01 06:54

If I'm correct, you're asking why the 4096x1x1 layer is much smaller.

That's because it's a fully connected layer. Every neuron from the last max-pooling layer (=256*13*13=43264 neurons) is connectd to every neuron of the fully-connected layer.

This is an example of an ALL to ALL connected neural network: As you can see, layer2 is bigger than layer3. That doesn't mean they can't connect.

There is no conversion of the last max-pooling layer -> all the neurons in the max-pooling layer are just connected with all the 4096 neurons in the next layer.

The 'dense' operation just means calculate the weights and biases of all these connections (= 4096 * 43264 connections) and add the bias of the neurons to calculate the next output.

It's connected the same was an MLP.

But why 4096? There is no reasoning. It's just a choice. It could have been 8000, it could have been 20, it just depends on what works best for the network.

0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2021-02-01 06:54

You are right in that the last convolutional layer has 256 x 13 x 13 = 43264 neurons. However, there is a max-pooling layer with stride = 3 and pool_size = 2. This will produce an output of size 256 x 6 x 6. You connect this to a fully-connected layer. In order to do that, you first have to flatten the output, which will take the shape - 256 x 6 x 6 = 9216 x 1. To map 9216 neurons to 4096 neurons, we introduce a 9216 x 4096 weight matrix as the weight of dense/fully-connected layer. Therefore, w^T * x = [9216 x 4096]^T * [9216 x 1] = [4096 x 1]. In short, each of the 9216 neurons will be connected to all 4096 neurons. That is why the layer is called a dense or a fully-connected layer.

As others have said it above, there is no hard rule about why this should be 4096. The dense layer just has to have enough number of neurons so as to capture variability of the entire dataset. The dataset under consideration - ImageNet 1K - is quite difficult and has 1000 categories. So 4096 neurons to start with do not seem too much.

0 讨论(0)
发布评论:

提交评论
- 加载中...
無奈伤痛

2021-02-01 06:56

I believe you want to know how the transition from a convolutional layer to a fully-connected, or dense layer, comes to be. You have to realize that, another way of viewing a convolutional layer is that it's a dense layer, but with sparse connections. This is explained in Goodfellow's book, Deep Learning, chapter 9.

Something similar applies with the output of a pooling operation, you just end up with something that resembles the output of a convolutional layer, but summarized. All the weights of all the convolutional kernels can then be connected to a fully-connected layer. This tipically entails in a first fully-connected layer that has many neurons, so you can use a second (or third) layer that will do the actual classification/regression.

As to the choice of the number of neurons in a dense layer that comes after a convolutional layer, there is no mathematical rule behind it, like the one with convolutional layers. Since the layer is fully connected, you are able to choose any size, just like in your typical multi-layer perceptron.

0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2021-02-01 06:58

I will show it by image, look the below image of network Alexnet

The layer 256 * 13 *13 will do max pooling operator then it will be 256 * 6 * 6=9216. Then will be flatten to connected to 4096 Fully connect network, so the parameters will be 9216 * 4096. You can see all the parameters computed in the below excel.

cited:

https://www.learnopencv.com/understanding-alexnet/

https://medium.com/@smallfishbigsea/a-walk-through-of-alexnet-6cbd137a5637

0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-02-01 07:03
The output size of pooling layer is
```
output = (input size - window size) / (stride + 1)
```
in the above case the input size is 13, most implementations of pooling add an extra layer of padding in order to keep the boundary pixels in the calculations, so the input size will become 14.

the most common window size and stride is W = 2 and S = 2 so put them in the formula
```
output = (14 - 2) / (2 + 1)
output = 12 / 3
output = 4
```
now there will be 256 feature maps produced of size 4x4, flatten that out and you get
```
flatten = 4 x 4 x 256
flatten = 4096
```
Hope this answers your question.
0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2021-02-01 07:04

No, 4096 is the dimensionality of the output of that layer, while the dimensionality of the input is 13x13x256. Both don't have to be equal as you see in the diagram.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题