Usually in a RNN only the previous input and hidden state is used to calculate the output. However, what would happen if we use up to n previous steps? In essence feeding an n-g