What does input_dim
, output_dim
and input_length
mean in:
Embedding(input_dim, output_dim, input_length)
As shown in Explain with example: how embedding layers in keras works you can turn a sentence into a list of integers (a vector or tensor). Example of vector with input_length (max lenght of the sentence being 6, in case your sentence is longer remaining words are trimmed)
'This is a text' --> [0 0 1 2 3 4]
'This is a very long text, my friends' --> [1 2 3 5 6 4]
Then using keras's embedding layer you can turn these vectors into embedding vectors of output_dim depth. For example output_dim = 3:
[0 0 1 2 3 4] -->
array([[ 0.00251105, 0.00724941, -0.01146401],
[ 0.00251105, 0.00724941, -0.01146401],
[ 0.03071865, 0.00953215, -0.01349484],
[ 0.02962008, 0.04860269, -0.04597988],
[-0.01875228, 0.03349927, -0.03210936],
[-0.02512982, 0.04811014, 0.03172458]], dtype=float32)
The last parameter input_dim is the size of the vocabulary mapped to embed vectors. You can see it by running
model.layers[0].get_weights()
since embedding layer is usually first layer of the model. In case it was 10, embedding layer contain ten vectors of size of output_dim. Notice that the first element correspond to the mapping of 0 in the input vector (0 --> [ 0.00251105, 0.00724941, -0.01146401]), second of 1 etc.
[array([[ 0.00251105, 0.00724941, -0.01146401],
[ 0.03071865, 0.00953215, -0.01349484],
[ 0.02962008, 0.04860269, -0.04597988],
[-0.01875228, 0.03349927, -0.03210936],
[-0.02512982, 0.04811014, 0.03172458],
[-0.00569617, -0.02348857, -0.00098624],
[ 0.01327456, 0.02390958, 0.00754261],
[-0.04041355, 0.03457253, -0.02879228],
[-0.02695872, 0.02807242, 0.03338097],
[-0.02057508, 0.00174383, 0.00792078]], dtype=float32)]
Increasing the input_dim allow you to map bigger vocabulary, but also increase number of parameters of the emdebbing layer. Number of parameters is input_dim x output_dim.
As far as I understood these vectors are initated randomly and trained as any other layer using optimizer's algorithm. You can however use different algorithms like word2vec or pretrained vectors like glove (https://nlp.stanford.edu/projects/glove/). Idea is that each word will represent a unique position in the space (described by it's vector) that you can apply some vector math on the word's semantics (meaning). E.g. W('cheesburger') - W('cheese') = W('hamburger') or W('prince') - W('man') + W('woman') = W('princess') see more e.g. on https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning