I am trying to implement the NER example using BERT and pytorch from the huggingface guide (https://huggingface.co/transformers/custom_datasets.html#ft-trainer). Reading in