Is there pre-trained doc2vec model?

前端 未结 2 1902
無奈伤痛
無奈伤痛 2021-01-11 19:27

Is there a pre-trained doc2vec model with a large data set, like Wikipedia or similar?

2条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-11 20:31

    I don't know of any good one. There's one linked from this project, but:

    • it's based on a custom fork from an older gensim, so won't load in recent code
    • it's not clear what parameters or data it was trained with, and the associated paper may have made uninformed choices about the effects of parameters
    • it doesn't appear to be the right size to include actual doc-vectors for either Wikipedia articles (4-million-plus) or article paragraphs (tens-of-millions), or a significant number of word-vectors, so it's unclear what's been discarded

    While it takes a long time and significant amount of working RAM, there is a Jupyter notebook demonstrating the creation of a Doc2Vec model from Wikipedia included in gensim:

    https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-wikipedia.ipynb

    So, I would recommend fixing the mistakes in your attempt. (And, if you succeed in creating a model, and want to document it for others, you could upload it somewhere for others to re-use.)

提交回复
热议问题