My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong?

问题

I'm training a Doc2Vec model using the below code, where tagged_data is a list of TaggedDocument instances I set up before:

max_epochs = 40

model = Doc2Vec(alpha=0.025, 
                min_alpha=0.001)

model.build_vocab(tagged_data)

for epoch in range(max_epochs):
    print('iteration {0}'.format(epoch))
    model.train(tagged_data,
                total_examples=model.corpus_count,
                epochs=model.iter)
    # decrease the learning rate
    model.alpha -= 0.001
    # fix the learning rate, no decay
    model.min_alpha = model.alpha

model.save("d2v.model")
print("Model Saved")

When I later check the model results, they're not good. What might have gone wrong?

回答1:

Do not call .train() multiple times in your own loop that tries to do alpha arithmetic.

It's unnecessary, and it's error-prone.

Specifically, in the above code, decrementing the original 0.025 alpha by 0.001 forty times results in (0.025 - 40*0.001) -0.015 final alpha, which would also have been negative for many of the training epochs. But a negative alpha learning-rate is nonsensical: it essentially asks the model to nudge its predictions a little bit in the wrong direction, rather than a little bit in the right direction, on every bulk training update. (Further, since model.iter is by default 5, the above code actually performs 40 * 5 training passes – 200 – which probably isn't the conscious intent. But that will just confuse readers of the code & slow training, not totally sabotage results, like the alpha mishandling.)

There are other variants of error that are common here, as well. If the alpha were instead decremented by 0.0001, the 40 decrements would only reduce the final alpha to 0.021 – whereas the proper practice for this style of SGD (Stochastic Gradient Descent) with linear learning-rate decay is for the value to end "very close to 0.000). If users start tinkering with max_epochs – it is, after all, a parameter pulled out on top! – but don't also adjust the decrement every time, they are likely to far-undershoot or far-overshoot 0.000.

So don't use this pattern.

Unfortunately, many bad online examples have copied this anti-pattern from each other, and make serious errors in their own epochs and alpha handling. Please don't copy their error, and please let their authors know they're misleading people wherever this problem appears.

The above code can be improved with the much-simpler replacement:

max_epochs = 40
model = Doc2Vec()  # of course, if non-default parameters needed, use them here

model.build_vocab(tagged_data)
model.train(tagged_data, total_examples=model.corpus_count, epochs=max_epochs)

model.save("d2v.model")

Here, the .train() method will do exactly the requested number of epochs, smoothly reducing the internal effective alpha from its default starting value to near-zero. (It's rare to need to change the starting alpha, but even if you wanted to, just setting a new non-default value at initial model-creation is enough.)

来源：https://stackoverflow.com/questions/62801052/my-doc2vec-code-after-many-loops-of-training-isnt-giving-good-results-what-m

标签

gensim

word2vec

doc2vec