Problems With ANN BackProp/Gradient Checking.

前端 未结 1 538
执笔经年
执笔经年 2021-01-17 03:13

Just wrote up my first Neural Network Class in python. Everything as far as I can tell should work, but there is some bug in it that I can\'t seem to find(Probably staring m

相关标签:
1条回答
  • 2021-01-17 03:58

    I'm not very proficient at reading python code, but your gradient list for XOR contains 3 elements, corresponding for 3 weights. I assume, that these are two inputs and one bias for a single neuron. If true, such network can not learn XOR (minimun NN that can learn XOR need two hidden neurons and one output unit). Now, looking at Feedforward function, if np.dot computes what it name says (i.e dot product of two vectors), and sigmoid is scalar, then this will always correspond to output of one neuron and I don't see the way how you can add more neurons to the layers with this code.

    Following advice could be useful to debug any newly implemented NN:

    1) Don't start with MNIST or even XOR. Perfectly good implementation may fail to learn XOR because it can easily fell into local minima and you could spent a lot of time hunting for non-existent error. A good starting point will be AND function, that can be learned with single neuron

    2) Check forward computation pass by manually computing results on few examples. thats easy to do with small number of weights. Then try to train it with numerical gradient. If it fails, then either your numerical gradient is wrong (check that by hand) or training procedure is wrong. (it can fail to work if you set too large learning rate, but otherwise training must converge since error surface is convex).

    3) once you can train it with numerical grad, debug your analytical gradients (check gradient per neuron, and then gradient for individual weights). That again can be computed manually and compared to what you see.

    4) Upon completion of step 3, if everything works OK, add one hidden layer and repeat steps 2 and 3 with AND function.

    5) after everything works with AND, you can move to XOR function and other more complicated tasks.

    This procedure may seems time consuming, but it almost aways results in working NN in the end

    0 讨论(0)
提交回复
热议问题