How to save fasttext model in vec format?

后端 未结 2 1669
伪装坚强ぢ
伪装坚强ぢ 2021-01-12 14:37

I trained my unsupervised model using fasttext.train_unsupervised() function in python. I want to save it as vec file since I will use this file for pretr

相关标签:
2条回答
  • 2021-01-12 15:36

    you should add words num and dimension at first line of your vec file, than use -preTrainedVectors para

    0 讨论(0)
  • 2021-01-12 15:37

    To obtain VEC file, containing merely all words vectors, I took inspiration from bin_to_vec official example.

    from fastText import load_model
    
    # original BIN model loading
    f = load_model(YOUR-BIN-MODEL-PATH)
        lines=[]
    
    # get all words from model
    words = f.get_words()
    
    with open(YOUR-VEC-FILE-PATH,'w') as file_out:
        
        # the first line must contain number of total words and vector dimension
        file_out.write(str(len(words)) + " " + str(f.get_dimension()) + "\n")
    
        # line by line, you append vectors to VEC file
        for w in words:
            v = f.get_word_vector(w)
            vstr = ""
            for vi in v:
                vstr += " " + str(vi)
            try:
                file_out.write(w + vstr+'\n')
            except:
                pass
    

    The obtained VEC file can be big. To reduce file size, you can adjust the format of vector components.

    If you want to keep only 4 decimal digits, you can replace vstr += " " + str(vi) with
    vstr += " " + "{:.4f}".format(vi)

    0 讨论(0)
提交回复
热议问题