word2vec

My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong?

纵饮孤独 提交于 2020-07-23 06:53:25
问题 I'm training a Doc2Vec model using the below code, where tagged_data is a list of TaggedDocument instances I set up before: max_epochs = 40 model = Doc2Vec(alpha=0.025, min_alpha=0.001) model.build_vocab(tagged_data) for epoch in range(max_epochs): print('iteration {0}'.format(epoch)) model.train(tagged_data, total_examples=model.corpus_count, epochs=model.iter) # decrease the learning rate model.alpha -= 0.001 # fix the learning rate, no decay model.min_alpha = model.alpha model.save("d2v

My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong?

独自空忆成欢 提交于 2020-07-23 06:52:06
问题 I'm training a Doc2Vec model using the below code, where tagged_data is a list of TaggedDocument instances I set up before: max_epochs = 40 model = Doc2Vec(alpha=0.025, min_alpha=0.001) model.build_vocab(tagged_data) for epoch in range(max_epochs): print('iteration {0}'.format(epoch)) model.train(tagged_data, total_examples=model.corpus_count, epochs=model.iter) # decrease the learning rate model.alpha -= 0.001 # fix the learning rate, no decay model.min_alpha = model.alpha model.save("d2v

My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong?

限于喜欢 提交于 2020-07-23 06:51:17
问题 I'm training a Doc2Vec model using the below code, where tagged_data is a list of TaggedDocument instances I set up before: max_epochs = 40 model = Doc2Vec(alpha=0.025, min_alpha=0.001) model.build_vocab(tagged_data) for epoch in range(max_epochs): print('iteration {0}'.format(epoch)) model.train(tagged_data, total_examples=model.corpus_count, epochs=model.iter) # decrease the learning rate model.alpha -= 0.001 # fix the learning rate, no decay model.min_alpha = model.alpha model.save("d2v

How to treat numbers inside text strings when vectorizing words?

别来无恙 提交于 2020-07-18 11:34:37
问题 If I have a text string to be vectorized, how should I handle numbers inside it? Or if I feed a Neural Network with numbers and words, how can I keep the numbers as numbers? I am planning on making a dictionary of all my words (as suggested here). In this case all strings will become arrays of numbers. How should I handle characters that are numbers? how to output a vector that does not mix the word index with the number character? Does converting numbers to strings weakens the information i

train Gensim word2vec using large txt file

主宰稳场 提交于 2020-07-10 10:20:26
问题 I have a large txt file(150MG) like this 'intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one', 'better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', ... I wanna train word2vec model model using that file but it gives me RAM problem.i dont know how to feed txt file to word2vec model.this is my code.i know that my code has problem but i don't know where is it. import gensim f = open('your_file1.txt') for line in f: b=line model = gensim.models.Word2Vec([b],min_count

Classification accuracy is too low (Word2Vec)

久未见 提交于 2020-06-29 03:37:06
问题 i'm working on an Multi-Label Emotion Classification problem to be solved by word2vec. this is my code that i've learned from a couple of tutorials. now the accuracy is very low. about 0.02 which is telling me something is wrong in my code. but i cannot find it. i tried this code for TF-IDF and BOW (obviously except word2vec part) and i got much better accuracy scores such as 0.28, but it seems this one is somehow wrong: np.set_printoptions(threshold=sys.maxsize) wv = gensim.models

Combining/adding vectors from different word2vec models

吃可爱长大的小学妹 提交于 2020-06-17 03:53:05
问题 I am using gensim to create Word2Vec models trained on large text corpora. I have some models based on StackExchange data dumps. I also have a model trained on a corpus derived from English Wikipedia. Assume that a vocabulary term is in both models, and that the models were created with the same parameters to Word2Vec. Is there any way to combine or add the vectors from the two separate models to create a single new model that has the same word vectors that would have resulted if I had

How can I count word frequencies in Word2Vec's training model?

怎甘沉沦 提交于 2020-06-01 07:04:05
问题 I need to count the frequency of each word in word2vec 's training model. I want to have output that looks like this: term count apple 123004 country 4432180 runs 620102 ... Is it possible to do that? How would I get that data out of word2vec? 回答1: Which word2vec implementation are you using? In the popular gensim library, after a Word2Vec model has its vocabulary established (either by doing its full training, or after build_vocab() has been called), the model's wv property contains a

How to initialize second glove model with solution from first?

一世执手 提交于 2020-05-30 03:38:38
问题 I am trying to implement one of the solutions to the question about How to align two GloVe models in text2vec?. I don't understand what are the proper values for input at GlobalVectors$new(..., init = list(w_i, w_j) . How do I ensure the values for w_i and w_j are correct? Here's a minimal reproducible example. First, prepare some corpora to compare, taken from the quanteda tutorial. I am using dfm_match(all_words) to try and ensure all words are present in each set, but this doesn't seem to

python word2vec not installing

自古美人都是妖i 提交于 2020-05-28 13:46:45
问题 I've been trying to install word2vec on my Windows 7 machine using my Python2.7 interpreter: https://github.com/danielfrg/word2vec I've tried downloading the zip & running python setup.py install from the unzipped directory and running pip install . however in both instances it returns the below errors: Downloading/unpacking word2vec Downloading word2vec-0.5.1.tar.gz Running setup.py egg_info for package word2vec Traceback (most recent call last): File "<string>", line 16, in <module> File "c