I\'m working on creating a 2 layer neural network with back-propagation. The NN is supposed to get its data from a 20001x17 vector that holds following information in each row:<
You can think of y2 as an output probability distribution for each input being one of the 26 alphabet characters, for example if one column of y2 says:
.2
.5
.15
.15
then its 50% probability that this character is B (if we assume only 4 possible outputs).
==REMARK==
The output layer of the NN consists of 26 outputs. Every time the NN is fed an input like the one described above it's supposed to output a 1x26 vector containing zeros in all but the one cell that corresponds to the letter that the input values were meant to represent. for example the output [1 0 0 ... 0] would be letter A, whereas [0 0 0 ... 1] would be the letter Z.
It is preferable to avoid using target values of 0,1 to encode the output of the network.
The reason for avoiding target values of 0 and 1 is that 'logsig' sigmoid transfer function cannot produce these output values given finite weights. If you attempt to train the network to fit target values of exactly 0 and 1, gradient descent will force the weights to grow without bound.
So instead of 0 and 1 values, try using values of 0.04 and 0.9 for example, so that [0.9,0.04,...,0.04] is the target output vector for the letter A.
Reference:
Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997, p114-115