BertForSequenceClassification uses [CLS] token\'s representation to feed a linear classifier. I want to leverage another token (say [X] in the input sequence) rather than [CLS].