问题
In this blog on Recurrent Neural Networks by Denny Britz. Author states that, "The above diagram has outputs at each time step, but depending on the task this may not be necessary. For example, when predicting the sentiment of a sentence we may only care about the final output, not the sentiment after each word. Similarly, we may not need inputs at each time step."
In the case when we take output only at the final timestep: How will backpropogation change, if there are no outputs at each time step, only the final one? We need to define loss at each time step, but how to do it without outputs?
回答1:
This is not true that you "need to define output at each timestep", actually backpropagation through time is simpler with a single output than the one on the image. When there is just one output simply "rotate your network 90 degrees" and it will be a regular feed forward network (simply with some signals coming into hidden layers directly) - backpropagation works as usually, pushing the partial derivatives through the system. When we have outputs at each step, this becomes more tricky and usually you define true loss to be sum of all the "small losses" and consequently you have to sum all the gradients.
来源:https://stackoverflow.com/questions/42725726/rnn-back-propagation-through-time-when-output-is-taken-only-at-final-timestep