Keras- Embedding layer

后端 未结 3 1054
梦如初夏
梦如初夏 2021-02-06 05:06

What does input_dim, output_dim and input_length mean in:

Embedding(input_dim, output_dim, input_length)

相关标签:
3条回答
  • 2021-02-06 05:28

    It order to use words for natural language processing or machine learning tasks, it is necessary to first map them onto a continuous vector space, thus creating word vectors or word embeddings. The Keras Embedding layer is useful for constructing such word vectors.

    input_dim : the vocabulary size. This is how many unique words are represented in your corpus.

    output_dim : the desired dimension of the word vector. For example, if output_dim = 100, then every word will be mapped onto a vector with 100 elements, whereas if output_dim = 300, then every word will be mapped onto a vector with 300 elements.

    input_length : the length of your sequences. For example, if your data consists of sentences, then this variable represents how many words there are in a sentence. As disparate sentences typically contain different number of words, it is usually required to pad your sequences such that all sentences are of equal length. The keras.preprocessing.pad_sequence method can be used for this (https://keras.io/preprocessing/sequence/).

    In Keras, it is possible to either 1) use pretrained word vectors such as GloVe or word2vec representations, or 2) learn the word vectors as part of the training process. This blog post (https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html) offers a tutorial on how to use GloVe pretrained word vectors. For option 2, Keras will randomly initialize vectors as the default option, and then learn optimal word vectors during the training process.

    0 讨论(0)
  • 2021-02-06 05:38
    • input_dim: is the size of the vocabulary to embed
    • output_dim: is the length of the embedding vector
    • input_length: is the maximum length of the input (sentence)

    As shown in Explain with example: how embedding layers in keras works you can turn a sentence into a list of integers (a vector or tensor). Example of vector with input_length (max lenght of the sentence being 6, in case your sentence is longer remaining words are trimmed)

     'This is a text' --> [0 0 1 2 3 4]
     'This is a very long text, my friends' --> [1 2 3 5 6 4]
    

    Then using keras's embedding layer you can turn these vectors into embedding vectors of output_dim depth. For example output_dim = 3:

    [0 0 1 2 3 4] --> 
    array([[ 0.00251105,  0.00724941, -0.01146401],
       [ 0.00251105,  0.00724941, -0.01146401],
       [ 0.03071865,  0.00953215, -0.01349484],
       [ 0.02962008,  0.04860269, -0.04597988],
       [-0.01875228,  0.03349927, -0.03210936],
       [-0.02512982,  0.04811014,  0.03172458]], dtype=float32)
    

    The last parameter input_dim is the size of the vocabulary mapped to embed vectors. You can see it by running

    model.layers[0].get_weights() 
    

    since embedding layer is usually first layer of the model. In case it was 10, embedding layer contain ten vectors of size of output_dim. Notice that the first element correspond to the mapping of 0 in the input vector (0 --> [ 0.00251105, 0.00724941, -0.01146401]), second of 1 etc.

    [array([[ 0.00251105,  0.00724941, -0.01146401],
        [ 0.03071865,  0.00953215, -0.01349484],
        [ 0.02962008,  0.04860269, -0.04597988],
        [-0.01875228,  0.03349927, -0.03210936],
        [-0.02512982,  0.04811014,  0.03172458],
        [-0.00569617, -0.02348857, -0.00098624],
        [ 0.01327456,  0.02390958,  0.00754261],
        [-0.04041355,  0.03457253, -0.02879228],
        [-0.02695872,  0.02807242,  0.03338097],
        [-0.02057508,  0.00174383,  0.00792078]], dtype=float32)]
    

    Increasing the input_dim allow you to map bigger vocabulary, but also increase number of parameters of the emdebbing layer. Number of parameters is input_dim x output_dim.

    As far as I understood these vectors are initated randomly and trained as any other layer using optimizer's algorithm. You can however use different algorithms like word2vec or pretrained vectors like glove (https://nlp.stanford.edu/projects/glove/). Idea is that each word will represent a unique position in the space (described by it's vector) that you can apply some vector math on the word's semantics (meaning). E.g. W('cheesburger') - W('cheese') = W('hamburger') or W('prince') - W('man') + W('woman') = W('princess') see more e.g. on https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning

    0 讨论(0)
  • 2021-02-06 05:44

    By taking a look at the keras documentation for the layer you see this:

    Embedding(1000, 64, input_length=10)
    #the model will take as input an integer matrix of size (batch, input_length).
    #the largest integer (i.e. word index) in the input should be no larger than 999 (vocabulary size).
    #now model.output_shape == (None, 10, 64), where None is the batch dimension.
    

    By using the values you gave in your post you can try to grasp the idea of this method and can come up with this settings:

    • input_dim=38
    • input_length=75

    while output_dim is a model parameter, which you still have to determine (and maybe have to try different values to find the optimal one).

    Edit: You can find additional information about embedding layers here.

    0 讨论(0)
提交回复
热议问题