Gensim word2vec on predefined dictionary and word-indices data

前端未结

关注

 2  1812

萌比男神i 2021-02-14 05:33

I need to train a word2vec representation on tweets using gensim. Unlike most tutorials and code I\'ve seen on gensim my data is not raw, but has already been preprocessed. I ha

2条回答

礼貌的吻别 (楼主)

2021-02-14 06:03

I had the same issue. Even converting to array of strings via

>>> arr_str = np.char.mod('%d', arr)

caused an exception when running Word2Vec:

>>> model = Word2Vec(arr_str)
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()

My solution was to write the array of integers as text and then use word2vec with LineSentence.

import numpy as np
from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence

np.savetxt('train_data.txt', arr, delimiter=" ", fmt="%s") 
sentences = LineSentence('train_data.txt')
model = Word2Vec(sentences)

0 讨论(0)

查看其它2个回答