Keras- Embedding layer

后端未结

关注

 3  1057

梦如初夏 2021-02-06 05:06

What does input_dim, output_dim and input_length mean in:

Embedding(input_dim, output_dim, input_length)

3条回答

一生所求 (楼主)

2021-02-06 05:38
- input_dim: is the size of the vocabulary to embed
- output_dim: is the length of the embedding vector
- input_length: is the maximum length of the input (sentence)
As shown in Explain with example: how embedding layers in keras works you can turn a sentence into a list of integers (a vector or tensor). Example of vector with input_length (max lenght of the sentence being 6, in case your sentence is longer remaining words are trimmed)
```
 'This is a text' --> [0 0 1 2 3 4]
 'This is a very long text, my friends' --> [1 2 3 5 6 4]
```
Then using keras's embedding layer you can turn these vectors into embedding vectors of output_dim depth. For example output_dim = 3:
```
[0 0 1 2 3 4] --> 
array([[ 0.00251105,  0.00724941, -0.01146401],
   [ 0.00251105,  0.00724941, -0.01146401],
   [ 0.03071865,  0.00953215, -0.01349484],
   [ 0.02962008,  0.04860269, -0.04597988],
   [-0.01875228,  0.03349927, -0.03210936],
   [-0.02512982,  0.04811014,  0.03172458]], dtype=float32)
```
The last parameter input_dim is the size of the vocabulary mapped to embed vectors. You can see it by running
```
model.layers[0].get_weights() 
```
since embedding layer is usually first layer of the model. In case it was 10, embedding layer contain ten vectors of size of output_dim. Notice that the first element correspond to the mapping of 0 in the input vector (0 --> [ 0.00251105, 0.00724941, -0.01146401]), second of 1 etc.
```
[array([[ 0.00251105,  0.00724941, -0.01146401],
    [ 0.03071865,  0.00953215, -0.01349484],
    [ 0.02962008,  0.04860269, -0.04597988],
    [-0.01875228,  0.03349927, -0.03210936],
    [-0.02512982,  0.04811014,  0.03172458],
    [-0.00569617, -0.02348857, -0.00098624],
    [ 0.01327456,  0.02390958,  0.00754261],
    [-0.04041355,  0.03457253, -0.02879228],
    [-0.02695872,  0.02807242,  0.03338097],
    [-0.02057508,  0.00174383,  0.00792078]], dtype=float32)]
```
Increasing the input_dim allow you to map bigger vocabulary, but also increase number of parameters of the emdebbing layer. Number of parameters is input_dim x output_dim.

As far as I understood these vectors are initated randomly and trained as any other layer using optimizer's algorithm. You can however use different algorithms like word2vec or pretrained vectors like glove (https://nlp.stanford.edu/projects/glove/). Idea is that each word will represent a unique position in the space (described by it's vector) that you can apply some vector math on the word's semantics (meaning). E.g. W('cheesburger') - W('cheese') = W('hamburger') or W('prince') - W('man') + W('woman') = W('princess') see more e.g. on https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...