gensim word2vec: Find number of words in vocabulary

前端未结

关注

 2  391

鱼传尺愫

After training a word2vec model using python gensim, how do you find the number of words in the model\'s vocabulary?

相关标签:

2条回答

一生所求

2021-01-31 08:48

One more way to get the vocabulary size is from the embedding matrix itself as in:

In [33]: from gensim.models import Word2Vec

# load the pretrained model
In [34]: model = Word2Vec.load(pretrained_model)

# get the shape of embedding matrix    
In [35]: model.wv.vectors.shape
Out[35]: (662109, 300)

# `vocabulary_size` is just the number of rows (i.e. axis 0)
In [36]: model.wv.vectors.shape[0]
Out[36]: 662109

0 讨论(0)

北海茫月

2021-01-31 08:49
The vocabulary is in the vocab field of the Word2Vec model's wv property, as a dictionary, with the keys being each token (word). So it's just the usual Python for getting a dictionary's length:
```
len(w2v_model.wv.vocab)
```
(In older gensim versions before 0.13, vocab appeared directly on the model. So you would use w2v_model.vocab instead of w2v_model.wv.vocab.)
0 讨论(0)
发布评论:

提交评论
- 加载中...