I know the basics of feedforward neural networks, and how to train them using the backpropagation algorithm, but I\'m looking for an algorithm than I can use for training an ANN
If the output that lead to a reward r
is backpropagated into the network r
times, you will reinforce the network proportionally to the reward. This is not directly applicable to negative rewards, but I can think of two solutions that will produce different effects:
1) If you have a set of rewards in a range rmin-rmax, rescale them to 0-(rmax-rmin)
so that they are all non-negative. The bigger the reward, the stronger the reinforcement that is created.
2) For a negative reward -r
, backpropagate a random output r
times, as long as it's different from the one that lead to the negative reward. This will not only reinforce desirable outputs, but also diffuses or avoids bad outputs.