I have a small, 3 layer, neural network with two input neurons, two hidden neurons and one output neuron. I am trying to stick to the below format of using only 2 hidden neurons
This is due to the fact that you have not considered any bias
for the neurons.
You have only used weights to try and fit the XOR
model.
Incase of 2 neurons in the hidden layer, the network under-fits as it can't compensate for the bias.
When you use 3 neurons in the hidden layer, the extra neuron counters the effect caused due to the lack of bias.
This is an example of a network for XOR gate. You'll notice theta
(bias) added to the hidden layers. This gives the network an additional parameter to tweak.
Additional resources
It is an unsolvable equation system, that is why NN can not solve it either. While it may be an oversimplification, if we say the transfer function is linear, the expression becomes something like
z = (w1*x+w2*y)*w3 + (w4*x+w5*y)*w6
Then there are the 4 cases:
xy=00, z=0 = 0
xy=10, z=1 = w1*w3+w4*w6
xy=01, z=1 = w2*w3+w5*w6
xy=11, z=0 = (w1+w2)*w3 + (w4+w5)*w6
The problem is that
0 = (w1+w2)*w3 + (w4+w5)*w6 = w1*w3+w2*w3 + w4*w6+w5*w6 <-- xy=11 line
= w1*w3+w4*w6 + w2*w3+w5*w6 = 1+1 = 2 <-- xy=10 and xy=01 lines
So the seemingly 6 degrees of freedom are just not enough here, that is why you experience the need for adding something extra.