What is “linear projection” in convolutional neural network

I am reading through Residual learning, and I have a question. What is "linear projection" mentioned in 3.2? Looks pretty simple once got this but could not get the idea...

I am basically not a computer science person, so I would very appreciate if someone provide me a simple example.

First up, it's important to understand what x, y and F are and why they need any projection at all. I'll try explain in simple terms, but basic understanding of ConvNets is required.

x is an input data (called tensor) of the layer, in case of ConvNets it's rank is 4. You can think of it as a 4-dimensional array. F is usually a conv layer (conv+relu+batchnorm in this paper), and y combines the two together (forming the output channel). The result of F is also of rank 4, and most of dimensions will be the same as in x, except for one. That's exactly what the transformation should patch.

For example, x shape might be (64, 32, 32, 3), where 64 is the batch size, 32x32 is image size and 3 stands for (R, G, B) color channels. F(x) might be (64, 32, 32, 16): batch size never changes, for simplicity, ResNet conv-layer doesn't change the image size too, but will likely use a different number of filters - 16.

So, in order for y=F(x)+x to be a valid operation, x must be "reshaped" from (64, 32, 32, 3) to (64, 32, 32, 16).

I'd like to stress here that "reshaping" here is not what numpy.reshape does.

Instead, x[3] is padded with 13 zeros, like this:

pad(x=[1, 2, 3],padding=[7, 6]) = [0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0]

If you think about it, this is a projection of a 3-dimensional vector onto 16 dimensions. In other words, we start to think that our vector is the same, but there are 13 more dimensions out there. None of the other x dimensions are changed.

Here's the link to the code in Tensorflow that does this.

A linear projection is one where each new feature is simple a weighted sum of the original features. As in the paper, this can be represented by matrix multiplication. if x is the vector of N input features and W is an M-byN matrix, then the matrix product Wx yields M new features where each one is a linear projection of x. Each row of W is a set of weights that defines one of the M linear projections (i.e., each row of W contains the coefficients for one of the weighted sums of x).

来源：https://stackoverflow.com/questions/46121283/what-is-linear-projection-in-convolutional-neural-network

标签

machine-learning

neural-network

projection

deep-residual-networks