Keras- Embedding layer

后端 未结 3 1057
梦如初夏
梦如初夏 2021-02-06 05:06

What does input_dim, output_dim and input_length mean in:

Embedding(input_dim, output_dim, input_length)

3条回答
  •  一生所求
    2021-02-06 05:38

    • input_dim: is the size of the vocabulary to embed
    • output_dim: is the length of the embedding vector
    • input_length: is the maximum length of the input (sentence)

    As shown in Explain with example: how embedding layers in keras works you can turn a sentence into a list of integers (a vector or tensor). Example of vector with input_length (max lenght of the sentence being 6, in case your sentence is longer remaining words are trimmed)

     'This is a text' --> [0 0 1 2 3 4]
     'This is a very long text, my friends' --> [1 2 3 5 6 4]
    

    Then using keras's embedding layer you can turn these vectors into embedding vectors of output_dim depth. For example output_dim = 3:

    [0 0 1 2 3 4] --> 
    array([[ 0.00251105,  0.00724941, -0.01146401],
       [ 0.00251105,  0.00724941, -0.01146401],
       [ 0.03071865,  0.00953215, -0.01349484],
       [ 0.02962008,  0.04860269, -0.04597988],
       [-0.01875228,  0.03349927, -0.03210936],
       [-0.02512982,  0.04811014,  0.03172458]], dtype=float32)
    

    The last parameter input_dim is the size of the vocabulary mapped to embed vectors. You can see it by running

    model.layers[0].get_weights() 
    

    since embedding layer is usually first layer of the model. In case it was 10, embedding layer contain ten vectors of size of output_dim. Notice that the first element correspond to the mapping of 0 in the input vector (0 --> [ 0.00251105, 0.00724941, -0.01146401]), second of 1 etc.

    [array([[ 0.00251105,  0.00724941, -0.01146401],
        [ 0.03071865,  0.00953215, -0.01349484],
        [ 0.02962008,  0.04860269, -0.04597988],
        [-0.01875228,  0.03349927, -0.03210936],
        [-0.02512982,  0.04811014,  0.03172458],
        [-0.00569617, -0.02348857, -0.00098624],
        [ 0.01327456,  0.02390958,  0.00754261],
        [-0.04041355,  0.03457253, -0.02879228],
        [-0.02695872,  0.02807242,  0.03338097],
        [-0.02057508,  0.00174383,  0.00792078]], dtype=float32)]
    

    Increasing the input_dim allow you to map bigger vocabulary, but also increase number of parameters of the emdebbing layer. Number of parameters is input_dim x output_dim.

    As far as I understood these vectors are initated randomly and trained as any other layer using optimizer's algorithm. You can however use different algorithms like word2vec or pretrained vectors like glove (https://nlp.stanford.edu/projects/glove/). Idea is that each word will represent a unique position in the space (described by it's vector) that you can apply some vector math on the word's semantics (meaning). E.g. W('cheesburger') - W('cheese') = W('hamburger') or W('prince') - W('man') + W('woman') = W('princess') see more e.g. on https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning

提交回复
热议问题