问题
I am pretty new to Hugging-Face transformers. I am facing the following issue when I try to load xlm-roberta-base model from a given path:
>> tokenizer = AutoTokenizer.from_pretrained(model_path)
>> Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 182, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 309, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 458, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_roberta.py", line 98, in __init__
**kwargs,
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 133, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType
However, if I load it by its name, there is no problem:
>> tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
I would appreciate any help.
回答1:
I assume you have created that directory as described in the documentation with :
tokenizer.save_pretrained('YOURPATH')
There is currently an issue under investigation which only affects the AutoTokenizers but not the underlying tokenizers like (XLMRobertaTokenizer). For example the following should work:
from transformers import XLMRobertaTokenizer
tokenizer = XLMRobertaTokenizer.from_pretrained('YOURPATH')
To work with the AutoTokenizer you also need to save the config to load it offline:
from transformers import AutoTokenizer, AutoConfig
tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
config = AutoConfig.from_pretrained('xlm-roberta-base')
tokenizer.save_pretrained('YOURPATH')
config.save_pretrained('YOURPATH')
tokenizer = AutoTokenizer.from_pretrained('YOURPATH')
I recommend to either use a different path for the tokenizers and the model or to keep the config.json of your model because some modifications you apply to your model will be stored in the config.json which is created during model.save_pretrained()
and will be overwritten when you save the tokenizer as described above after your model (i.e. you won't be able to load your modified model with tokenizer config.json).
来源:https://stackoverflow.com/questions/62641972/hugging-face-transformers-loading-model-from-path-error