Gensim Doc2Vec Exception AttributeError: 'str' object has no attribute 'words'

≯℡__Kan透↙ 提交于 2020-01-17 07:49:08

问题


I am learning Doc2Vec model from gensim library and using it as follows:

class MyTaggedDocument(object):
    def __init__(self, dirname):
        self.dirname = dirname

    def __iter__(self):
        for fname in os.listdir(self.dirname):
            with open(os.path.join(self.dirname, fname),encoding='utf-8') as fin:
                print(fname)
                for item_no, sentence in enumerate(fin):
                    yield LabeledSentence([w for w in sentence.lower().split() if w in stopwords.words('english')], [fname.split('.')[0].strip() + '_%s' % item_no])
sentences = MyTaggedDocument(dirname)
model = Doc2Vec(sentences,min_count=2, window=10, size=300, sample=1e-4, negative=5, workers=7)

The input dirname is a directory path which has , for the sake of simplicity, only 2 files located with each file containing more than 100 lines. I am getting following Exception.

Also, with print statement I could see that the iterator iterated over directory 6 times. Why is this so?

Any kind of help would be appreciated.


回答1:


It looks like one of the text-example objects, which should be shaped like a TaggedDocument (with words and tags properties, formerly called LabeledSentence), is somehow a plain string instead. Are you 100% certain that the error in your screenshot was generated by exactly the iterable code you've included? (The code here looks like it could only emit acceptable LabeledSentece objects.)

Your supplied corpus Iterable is read once to do an initial scan which discovered all words/tags, then again multiple times for training. How many times is controlled by the iter parameter, with a default value (in recent versions of gensim) of 5. So the initial scan plus 5 training passes equal 6 total iterations. (10 or more iterations is common with Doc2Vec.)



来源:https://stackoverflow.com/questions/41223299/gensim-doc2vec-exception-attributeerror-str-object-has-no-attribute-words

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!