Issues in doc2vec tags in Gensim

大城市里の小女人 提交于 2019-12-12 04:46:56

问题


I am using gensim doc2vec as below.

from gensim.models import doc2vec
from collections import namedtuple
import re

my_d = {'recipe__001__1': 'recipe 1 details should come here',
 'recipe__001__2': 'Ingredients of recipe 2 need to be added'}
docs = []
analyzedDocument = namedtuple('AnalyzedDocument', 'words tags')
for key, value in my_d.items():
    value = re.sub("[^a-zA-Z]"," ", value)
    words = value.lower().split()
    tags = key
    docs.append(analyzedDocument(words, tags))
model = doc2vec.Doc2Vec(docs, size = 300, window = 10, dm=1, negative=5, hs=0, min_count = 1, workers = 4, iter = 20)

However, when I check model.docvecs.offset2doctag I get ['r', 'e', 'c', 'i', 'p', '_', '0', '1', '2'] as the output. The real output should be `'recipe__001__1' and 'recipe__001__2'.

When I use len(model.docvecs.doctag_syn0) I get 9 as the output. But the real value should be 2 because I only have 2 recipes in my test dictionary.

Please let me know, why this happens?


回答1:


Try to change this line:

tags = key

to

tags = [key]


来源:https://stackoverflow.com/questions/47332205/issues-in-doc2vec-tags-in-gensim

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!