Keras initialize large embeddings layer with pretrained embeddings

后端 未结 1 1893
自闭症患者
自闭症患者 2021-02-08 13:38

I am trying to re-train a word2vec model in Keras 2 with Tensorflow backend by using pretrained embeddings and custom corpus.

This is how I initialize the embeddings lay

相关标签:
1条回答
  • 2021-02-08 14:23

    Instead of using the embeddings_initializer argument of the Embedding layer you can load pre-trained weights for your embedding layer using the weights argument, this way you should be able to hand over pre-trained embeddings larger than 2GB.

    Here is a short example:

    from keras.layers import Embedding
    
    embedding_layer = Embedding(vocab_size,
                                EMBEDDING_DIM,
                                weights=[embedding_matrix],
                                input_length=MAX_SEQUENCE_LENGTH,
                                trainable=False)
    

    Where embedding_matrix is just a regular numpy matrix containing your weights.

    For for examples you can also take a look here:
    https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html


    Edit:

    As @PavlinMavrodiev (see end of question) pointed out correctly the weights argument is deprecated. He instead used the layer method set_weights to set the weights instead:

    • layer.set_weights(weights): sets the weights of the layer from a list of Numpy arrays (with the same shapes as the output of get_weights).

    To get trained weights get_weights can be used:

    • layer.get_weights(): returns the weights of the layer as a list of Numpy arrays.

    Both are methods from the Keras Layer-Baseclass and can be used for all keras layers, including embeddings layer.

    0 讨论(0)
提交回复
热议问题