gensim word2vec: Find number of words in vocabulary

前端 未结 2 388
鱼传尺愫
鱼传尺愫 2021-01-31 07:46

After training a word2vec model using python gensim, how do you find the number of words in the model\'s vocabulary?

2条回答
  •  一生所求
    2021-01-31 08:48

    One more way to get the vocabulary size is from the embedding matrix itself as in:

    In [33]: from gensim.models import Word2Vec
    
    # load the pretrained model
    In [34]: model = Word2Vec.load(pretrained_model)
    
    # get the shape of embedding matrix    
    In [35]: model.wv.vectors.shape
    Out[35]: (662109, 300)
    
    # `vocabulary_size` is just the number of rows (i.e. axis 0)
    In [36]: model.wv.vectors.shape[0]
    Out[36]: 662109
    

提交回复
热议问题