Understanding the output of Doc2Vec from Gensim package

后端 未结 2 402
暖寄归人
暖寄归人 2021-01-05 09:43

I have some sample sentences that I want to run through a Doc2Vec model. My end goal is a matrix of size (num_sentences, num_features).

I\'m using the Gensim packag

2条回答
  •  别那么骄傲
    2021-01-05 10:14

    model.docvecs is an iterable with length equal to the number of documents you supplied the model. Each docvec is a vector representation of a single document. Its length is determined by the size parameter that you gave it when you trained the model. size is commonly between 100 and 300, and sometimes longer. A vector of length 10 would do a poor job at representing the documents you fed it.

    Thus, something like this would be more productive:

    for i in range(0, len(lot)):
        docs.append(gn.models.doc2vec.TaggedDocument(words=lot[i], tags=[i]))
    

    Where lot is a list of lists of tokens (words) like this:

    lot = [['the','cat','sat'],['the','dog','ran']]
    

    Running the model:

    gn.models.doc2vec.Doc2Vec(docs, size=300, window=8, dm=1, hs=1, alpha=.025, min_alpha=.0001)
    

提交回复
热议问题