问题
I got a problem in understending the difference between MLP and SLP.
I know that in the first case the MLP has more than one layer (the hidden layers) and that the neurons got a non linear activation function, like the logistic function (needed for the gradient descent). But I have read that:
"if all neurons in an MLP had a linear activation function, the MLP could be replaced by a single layer of perceptrons, which can only solve linearly separable problems"
I don't understand why in the specific case of the XOR, which is not linearly separable, the equivalent MLP is a two layer network, that for every neurons got a linear activation function, like the step function. I understand that I need two line for the separation, but in this case I cannot apply the rule of the previous statment (the replacement of the MLP with the SLP).
Mlp for xor:
http://s17.postimg.org/c7hwv0s8f/xor.png
In the linked image the neurons A B and C have a linear activation function (like the step function)
Xor: http://s17.postimg.org/n77pkd81b/xor1.png
回答1:
A linear function is f(x) = a x + b
. If we take another linear function g(z) = c z + d
, and apply g(f(x)) (which would be the equivalent of feeding the output of one linear layer as the input to the next linear layer) we get g(f(x)) = c (a x + b) + d = ac x + cb + d = (ac) x + (cb + d)
which is in itself another linear function.
The step function is not a linear function - You cannot write it as a x + b
. That's why a MLP using a step function is strictly more expressive than a single layer perceptron using a step function.
来源:https://stackoverflow.com/questions/30559405/multilayer-perceptron-replaced-with-single-layer-perceptron