As you may know, RoBERTa (BERT, etc.) has its own tokenizer and sometimes you get pieces of given word as tokens, e.g. embeddings » embed, #dings
Since the nature of the