问题
I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly how.
gensimmodel
Out[252]:
<gensim.models.word2vec.Word2Vec at 0x110b24b70>
import spacy
spacy.load(gensimmodel)
OSError: [E050] Can't find model 'Word2Vec(vocab=250, size=1000, alpha=0.025)'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
回答1:
Train and save your model in plain-text format:
from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec
path = get_tmpfile("./data/word2vec.model")
model = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
model.wv.save_word2vec_format("./data/word2vec.txt")
Gzip the text file:
gzip word2vec.txt
Which produces a word2vec.txt.gz
file.
Run the following command:
python -m spacy init-model en ./data/spacy.word2vec.model --vectors-loc word2vec.txt.gz
Load the vectors using:
nlp = spacy.load('./data/spacy.word2vec.model/')
回答2:
As explained here, you can import custom word vectors that trained using Gensim, Fast Text, or Tomas Mikolov's original word2vec implementation, by creating a model using:
wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
python -m spacy init-model en your_model --vectors-loc cc.la.300.vec.gz
then you can load you model, nlp = spacy.load('your_model')
and use it!
Also see the similar question that answered here.
来源:https://stackoverflow.com/questions/50466643/in-spacy-how-to-use-your-own-word2vec-model-created-in-gensim