Taking the last state from BiLSTM (BiGRU) in PyTorch
问题 After reading several articles, I am still quite confused about correctness of my implementation of getting last hidden states from BiLSTM. Understanding Bidirectional RNN in PyTorch (TowardsDataScience) PackedSequence for seq2seq model (PyTorch forums) What's the difference between “hidden” and “output” in PyTorch LSTM? (StackOverflow) Select tensor in a batch of sequences (Pytorch formums) The approach from the last source (4) seems to be the cleanest for me, but I am still uncertain if I