Wor2vec fine-tuning

前端 未结 3 666
孤城傲影
孤城傲影 2021-01-07 01:41

I need to fine-tune my word2vec model. I have two datasets, data1 and data2.

What I did so far is:

model = gensim.models.Word         


        
3条回答
  •  礼貌的吻别
    2021-01-07 02:05

    Is this correct?

    Yes, it is. You need to make sure that data2's words in vocabulary provided by data1. If it isn't the words - that isn't presented in vocabulary - will be lost.

    Note that the weights that will be computed by

    model.train(data1, total_examples=len(data1), epochs=epochs)

    and

    model.train(data2, total_examples=len(data2), epochs=epochs)

    isn't equal to

    model.train(data1+data2, total_examples=len(data1+data2), epochs=epochs)

    Do I need to store learned weights somewhere?

    No, you don't need to.

    But if you want you can save weights as a file so you can use them later.

    model.save("word2vec.model")
    

    And you load them by

    model = Word2Vec.load("word2vec.model")
    

    (source)

    I need to fine tune my word2vec model.

    Note that "Word2vec training is an unsupervised task, there’s no good way to objectively evaluate the result. Evaluation depends on your end application." (source) But there's some evaluations that you can look-up here ("How to measure quality of the word vectors" section)

    Hope that helps!

提交回复
热议问题