Matlab - Neural network training

前端 未结 4 710
有刺的猬
有刺的猬 2021-02-06 14:06

I\'m working on creating a 2 layer neural network with back-propagation. The NN is supposed to get its data from a 20001x17 vector that holds following information in each row:<

相关标签:
4条回答
  • 2021-02-06 14:09

    You can think of y2 as an output probability distribution for each input being one of the 26 alphabet characters, for example if one column of y2 says:

    .2
    .5
    .15
    .15
    

    then its 50% probability that this character is B (if we assume only 4 possible outputs).



    ==REMARK==

    The output layer of the NN consists of 26 outputs. Every time the NN is fed an input like the one described above it's supposed to output a 1x26 vector containing zeros in all but the one cell that corresponds to the letter that the input values were meant to represent. for example the output [1 0 0 ... 0] would be letter A, whereas [0 0 0 ... 1] would be the letter Z.

    It is preferable to avoid using target values of 0,1 to encode the output of the network.
    The reason for avoiding target values of 0 and 1 is that 'logsig' sigmoid transfer function cannot produce these output values given finite weights. If you attempt to train the network to fit target values of exactly 0 and 1, gradient descent will force the weights to grow without bound.
    So instead of 0 and 1 values, try using values of 0.04 and 0.9 for example, so that [0.9,0.04,...,0.04] is the target output vector for the letter A.


    Reference:
    Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997, p114-115

    0 讨论(0)
  • 2021-02-06 14:16
    1. Use hardlin fcn in output layer.
      1. Use trainlm or trainrp for training the network.
      2. To learn your network, use a for loop and a condition that compare the output and target. When it is the best use, break to exit from the learning loop.
      3. Use another way instead of mapminmax for pre-processing data set.
    0 讨论(0)
  • 2021-02-06 14:18

    I don't know if this constitutes an actual answer or not: but here are some remarks.

    • I don't understand your coding scheme. How is an 'A' represented as that set of numbers? It looks like you're falling into a fairly common trap of using arbitrary numbers to code categorical values. Don't do this: for example if 'a' is 1, 'b' is 2 and 'c' is 3, then your coding has implicitly stated that 'a' is more like 'b' than 'c' (because the network has real-value inputs the ordinal properties matter). The way to do this properly is to have each letter represented as 26 binary valued inputs, where only one is ever active, representing the letter.
    • Your outputs are correct, the activation at the output layer will not ever be either 0 or 1, but real numbers. You could take the max as your activity function, but this is problematic because it's not differentiable, so you can't use back-prop. What you should do is couple the outputs with the softmax function, so that their sum is one. You can then treat the outputs as conditional probabilities given the inputs, if you so desire. While the network is not explicitly probabilistic, with the correct activity and activation functions is will be identical in structure to a log-linear model (possibly with latent variables corresponding to the hidden layer), and people do this all the time.

    See David Mackay's textbook for a nice intro to neural nets which will make clear the probabilistic connection. Take a look at this paper from Geoff Hinton's group which describes the task of predicting the next character given the context for details on the correct representation and activation/activity functions (although beware their method is non-trivial and uses a recurrent net with a different training method).

    0 讨论(0)
  • 2021-02-06 14:23

    This is normal. Your output layer is using a log-sigmoid transfer function, and that will always give you some intermediate output between 0 and 1.

    What you would usually do would be to look for the output with the largest value -- in other words, the most likely character.

    This would mean that, for every column in y2, you're looking for the index of the row that contains the largest value in that row. You can compute this as follows:

    [dummy, I]=max(y2);
    

    I is then a vector containing the indexes of the largest value in each row.

    0 讨论(0)
提交回复
热议问题