问题
I have a large txt file(150MG) like this
'intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one', 'better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', ...
I wanna train word2vec model model using that file but it gives me RAM problem.i dont know how to feed txt file to word2vec model.this is my code.i know that my code has problem but i don't know where is it.
import gensim
f = open('your_file1.txt')
for line in f:
b=line
model = gensim.models.Word2Vec([b],min_count=1,size=32)
w1 = "bad"
model.wv.most_similar (positive=w1)
回答1:
You can make an iterator that reads your file one line at a time instead of reading everything in memory at once. The following should work:
class SentenceIterator:
def __init__(self, filepath):
self.filepath = filepath
def __iter__(self):
for line in open(self.filepath):
yield line.split()
sentences = SentenceIterator('datadir/textfile.txt')
model = Word2Vec(sentences)
来源:https://stackoverflow.com/questions/55086734/train-gensim-word2vec-using-large-txt-file