问题
I have 3 word embeddings :
- embedding#1 : [w11, w12, w13, w14]
- embedding#2 : [w21, w22, w23, w24]
- embedding#3 : [w31, w32, w33, w34]
Is there a way to get a fourth embedding by adding all three vectors, with the trainable weights from all of them, like:
- embedding#4 : [w11 + w21 + w31, w12 + w22 + w32, w13 + w23 + w33, w14 + w24 + w34]
? Is there a way to do this in a keras layer?
Problem
I want to learn the word embeddings for Indonesian language. I plan to do this by training a sequence prediction machine using LSTMs.
However, the grammar of Indonesian language is different from english. Especially, in Indonesian, you can modify a word using prefixes and suffixes. A noun word when given a prefix can become a verb, and when given a suffix can become an adjective. You can put so many into one word, so that a single base word can have 5 or more variations.
For example :
- tani means farm (verb)
- pe-tani means farmer
- per-tani-an means farm (noun)
- ber-tani means farm (verb, with slightly different meaning)
The transformation of semantic done by appending a prefix to a word is consistent between words. For example :
- pe-tani is to tani is what pe-layan is to layan, what pe-layar is to layar, what pe-tembak is to tembak, and so on.
- per-main-an is to main is what per-guru-an is to guru, what per-kira-an is to kira, what per-surat-an is to surat, and so on.
Therefore, i plan to represent the prefixes and suffixes as embeddings, which would be used to do an addition to the base word's embedding, producing a new embedding. So the meaning of the composite word is derived from the embeddings of the base word and the affixes, not stored as a separate embeddings. However i don't know how to do this in a Keras layer. If it had been asked before, i cannot find it.
回答1:
When you say "three word embeddings", I see three Embedding layers, such as:
input1 = Input((sentenceLength,))
input2 = Input((sentenceLength,))
input3 = Input((sentenceLength,))
emb1 = Embedding(...options...)(input1)
emb2 = Embedding(...options...)(input2)
emb3 = Embedding(...options...)(input3)
You can use a simple Add()
layer to sum the three:
summed = Add()([emb1,emb2,emb3])
Then you continue your modeling...
#after creating the rest of the layers and getting the desired output:
model = Model([input1,input2,input3],output)
If you're not using embedding layers, but you're inputting three vectors:
input1 = Input((4,)) #or perhaps (sentenceLength,4)
input2 = Input((4,))
input3 = Input((4,))
added = Add()([input1,input2,input3])
And the rest is the same.
If this is not your question, please give more details about where the three "word embeddings" are coming from, how you intend to select them, etc.
来源:https://stackoverflow.com/questions/47326026/keras-addition-layer-for-embeddings-vectors