Understanding Keras LSTMs

前端 未结 3 1988
粉色の甜心
粉色の甜心 2020-11-22 04:39

I am trying to reconcile my understand of LSTMs and pointed out here in this post by Christopher Olah implemented in Keras. I am following the blog written by Jason Brownlee

3条回答
  •  悲&欢浪女
    2020-11-22 05:16

    When you have return_sequences in your last layer of RNN you cannot use a simple Dense layer instead use TimeDistributed.

    Here is an example piece of code this might help others.

    words = keras.layers.Input(batch_shape=(None, self.maxSequenceLength), name = "input")

        # Build a matrix of size vocabularySize x EmbeddingDimension 
        # where each row corresponds to a "word embedding" vector.
        # This layer will convert replace each word-id with a word-vector of size Embedding Dimension.
        embeddings = keras.layers.embeddings.Embedding(self.vocabularySize, self.EmbeddingDimension,
            name = "embeddings")(words)
        # Pass the word-vectors to the LSTM layer.
        # We are setting the hidden-state size to 512.
        # The output will be batchSize x maxSequenceLength x hiddenStateSize
        hiddenStates = keras.layers.GRU(512, return_sequences = True, 
                                            input_shape=(self.maxSequenceLength,
                                            self.EmbeddingDimension),
                                            name = "rnn")(embeddings)
        hiddenStates2 = keras.layers.GRU(128, return_sequences = True, 
                                            input_shape=(self.maxSequenceLength, self.EmbeddingDimension),
                                            name = "rnn2")(hiddenStates)
    
        denseOutput = TimeDistributed(keras.layers.Dense(self.vocabularySize), 
            name = "linear")(hiddenStates2)
        predictions = TimeDistributed(keras.layers.Activation("softmax"), 
            name = "softmax")(denseOutput)  
    
        # Build the computational graph by specifying the input, and output of the network.
        model = keras.models.Model(input = words, output = predictions)
        # model.compile(loss='kullback_leibler_divergence', \
        model.compile(loss='sparse_categorical_crossentropy', \
            optimizer = keras.optimizers.Adam(lr=0.009, \
                beta_1=0.9,\
                beta_2=0.999, \
                epsilon=None, \
                decay=0.01, \
                amsgrad=False))
    

提交回复
热议问题