问题
In an online textbook on neural networks and deep learning, the author illustrates neural net basics in terms of minimizing a quadratic cost function which he says is synonymous with mean squared error. Two things have me confused about his function, though (pseudocode below).
MSE≡(1/2n)*∑‖y_true-y_pred‖^2
- Instead of dividing the sum of squared errors by the number of training examples n why is it instead divided by 2n? How is this the mean of anything?
- Why is double bar notation used instead of parentheses? This had me thinking there was some other calculation going on, such as of an L2-norm, that is not shown explicitly. I suspect this is not the case and that term is meant to express plain old sum of squared errors. Super confusing though.
Any insight you can offer is greatly appreciated!
回答1:
The 0.5 factor by which the cost function is multiplied is not important. In fact you could multiply it by any real constant you want, and the learning would be the same. It's only used so that the derivative of the cost function with respect to the output will simply be $$y - y_{t}$$. Which is convenient in some applications, like backpropagation.
回答2:
The notation ∥v∥ just denotes the usual length function for a vector v. From the online textbook you referenced.
Find more info on the double bars here. But from what I understand, you can basically view it as an absolute term.
I'm not sure why it says 2n, but it's not always 2n. Wikipedia for example writes the function as follows:
Googling Mean Squared Error also has a lot of sources using the Wikipedia one, instead of theo ne from the online textbook.
来源:https://stackoverflow.com/questions/44038581/mse-cost-function-for-training-neural-network