问题
It is the example given in the documentation of transformers pytorch library
from transformers import BertTokenizer, BertForTokenClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased',
output_hidden_states=True, output_attentions=True)
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute",
add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, scores, hidden_states,attentions = outputs
Here hidden_states
is a tuple of length 13 and contains hidden-states of the model at the output of each layer plus the initial embedding outputs. I would like to know, whether hidden_states[0] or hidden_states[12] represent the final hidden state vectors?
回答1:
If you check the source code, specifically BertEncoder, you can see that the returned states are initialized as an empty tuple and then simply appended per iteration of each layer.
The final layer is appended as the last element after this loop, see here, so we can safely assume that hidden_states[12]
is the final vectors.
来源:https://stackoverflow.com/questions/60847291/confusion-in-understanding-the-output-of-bertfortokenclassification-class-from-t