Neural Network and Temporal Difference Learning
问题 I have a read few papers and lectures on temporal difference learning (some as they pertain to neural nets, such as the Sutton tutorial on TD-Gammon) but I am having a difficult time understanding the equations, which leads me to my questions. -Where does the prediction value V_t come from? And subsequently, how do we get V_(t+1)? -What exactly is getting back propagated when TD is used with a neural net? That is, where does the error that gets back propagated come from when using TD? 回答1: