train Gensim word2vec using large txt file

主宰稳场 提交于 2020-07-10 10:20:26

问题


I have a large txt file(150MG) like this

'intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one', 'better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', ...

I wanna train word2vec model model using that file but it gives me RAM problem.i dont know how to feed txt file to word2vec model.this is my code.i know that my code has problem but i don't know where is it.

import gensim 


f = open('your_file1.txt')
for line in f:
    b=line
   model = gensim.models.Word2Vec([b],min_count=1,size=32)

w1 = "bad"
model.wv.most_similar (positive=w1)

回答1:


You can make an iterator that reads your file one line at a time instead of reading everything in memory at once. The following should work:

class SentenceIterator: 
    def __init__(self, filepath): 
        self.filepath = filepath 

    def __iter__(self): 
        for line in open(self.filepath): 
            yield line.split() 

sentences = SentenceIterator('datadir/textfile.txt') 
model = Word2Vec(sentences)


来源:https://stackoverflow.com/questions/55086734/train-gensim-word2vec-using-large-txt-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!