Hugging-Face Transformers: Loading model from path error

别来无恙 提交于 2020-07-10 10:28:16

问题


I am pretty new to Hugging-Face transformers. I am facing the following issue when I try to load xlm-roberta-base model from a given path:

>> tokenizer = AutoTokenizer.from_pretrained(model_path)
>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 182, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 309, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 458, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_roberta.py", line 98, in __init__
    **kwargs,
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 133, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

However, if I load it by its name, there is no problem:

>> tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')

I would appreciate any help.


回答1:


I assume you have created that directory as described in the documentation with :

tokenizer.save_pretrained('YOURPATH')

There is currently an issue under investigation which only affects the AutoTokenizers but not the underlying tokenizers like (XLMRobertaTokenizer). For example the following should work:

from transformers import XLMRobertaTokenizer

tokenizer = XLMRobertaTokenizer.from_pretrained('YOURPATH')

To work with the AutoTokenizer you also need to save the config to load it offline:

from transformers import AutoTokenizer, AutoConfig

tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
config = AutoConfig.from_pretrained('xlm-roberta-base')

tokenizer.save_pretrained('YOURPATH')
config.save_pretrained('YOURPATH')

tokenizer = AutoTokenizer.from_pretrained('YOURPATH')

I recommend to either use a different path for the tokenizers and the model or to keep the config.json of your model because some modifications you apply to your model will be stored in the config.json which is created during model.save_pretrained() and will be overwritten when you save the tokenizer as described above after your model (i.e. you won't be able to load your modified model with tokenizer config.json).



来源:https://stackoverflow.com/questions/62641972/hugging-face-transformers-loading-model-from-path-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!