问题
I'm using pytorch and I'm using the base pretrained bert to classify sentences for hate speech. I want to implement a Bi-LSTM layer that takes as an input all outputs of the latest transformer encoder from the bert model as a new model (class that implements nn.Module), and i got confused with the nn.LSTM parameters. I tokenized the data using
bert = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=int(data['class'].nunique()),output_attentions=False,output_hidden_states=False)
My data-set has 2 columns: class(label), sentence. Can someone help me with this? Thank you in advance.
Edit: Also, after processing the input in the bi-lstm, the network sends the final hidden state to a fully connected network that performs classication using the softmax activation function. how can I do that ?
回答1:
You can do it as follows:
from transformers import BertModel
class CustomBERTModel(nn.Module):
def __init__(self):
super(CustomBERTModel, self).__init__()
self.bert = BertModel.from_pretrained("bert-base-uncased")
### New layers:
self.lstm = nn.LSTM(768, 256, batch_first=True,bidirectional=True)
self.linear = nn.Linear(256*2, <number_of_classes>, batch_first=True)
def forward(self, ids, mask):
sequence_output, pooled_output = self.bert(
ids,
attention_mask=mask)
# sequence_output has the following shape: (batch_size, sequence_length, 768)
lstm_output, (h,c) = self.lstm(sequence_output) ## extract the 1st token's embeddings
hidden = torch.cat((lstm_output[:,-1, :256],lstm_output[:,0, 256:]),dim=-1)
linear_output = self.linear(lstm_output[:,-1].view(-1,256*2)) ### assuming that you are only using the output of the last LSTM cell to perform classification
return linear_output
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
model = CustomBERTModel()
来源:https://stackoverflow.com/questions/65205582/how-can-i-add-a-bi-lstm-layer-on-top-of-bert-model