Gensim word2vec on predefined dictionary and word-indices data

前端 未结 2 1812
萌比男神i
萌比男神i 2021-02-14 05:33

I need to train a word2vec representation on tweets using gensim. Unlike most tutorials and code I\'ve seen on gensim my data is not raw, but has already been preprocessed. I ha

2条回答
  •  礼貌的吻别
    2021-02-14 06:03

    I had the same issue. Even converting to array of strings via

    >>> arr_str = np.char.mod('%d', arr)
    

    caused an exception when running Word2Vec:

    >>> model = Word2Vec(arr_str)
    ValueError: The truth value of an array with more than one element is ambiguous.
    Use a.any() or a.all()
    

    My solution was to write the array of integers as text and then use word2vec with LineSentence.

    import numpy as np
    from gensim.models import Word2Vec
    from gensim.models.word2vec import LineSentence
    
    np.savetxt('train_data.txt', arr, delimiter=" ", fmt="%s") 
    sentences = LineSentence('train_data.txt')
    model = Word2Vec(sentences)
    

提交回复
热议问题