I\'m still fairly new to neural networks, so sorry on beforehand for any ambiguities to the following.
In a "standard" LSTM implementation for language task