How to use multilayered bidirectional LSTM in Tensorflow?

前端 未结 4 1872
囚心锁ツ
囚心锁ツ 2021-02-06 05:02

I want to know how to use multilayered bidirectional LSTM in Tensorflow.

I have already implemented the contents of bidirectional LSTM, but I wanna compare this model wi

相关标签:
4条回答
  • 2021-02-06 05:37

    This is primarily same as the first answer but with a little variation of usage of scope name and with added dropout wrappers. It also takes care of the error the first answer gives about variable scope.

    def bidirectional_lstm(input_data, num_layers, rnn_size, keep_prob):
    
        output = input_data
        for layer in range(num_layers):
            with tf.variable_scope('encoder_{}'.format(layer),reuse=tf.AUTO_REUSE):
    
                # By giving a different variable scope to each layer, I've ensured that
                # the weights are not shared among the layers. If you want to share the
                # weights, you can do that by giving variable_scope as "encoder" but do
                # make sure first that reuse is set to tf.AUTO_REUSE
    
                cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
                cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob = keep_prob)
    
                cell_bw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
                cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob = keep_prob)
    
                outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw, 
                                                                  cell_bw, 
                                                                  output,
                                                                  dtype=tf.float32)
    
                # Concat the forward and backward outputs
                output = tf.concat(outputs,2)
    
        return output
    
    0 讨论(0)
  • 2021-02-06 05:39

    On top of Taras's answer. Here is another example using just 2-layer Bidirectional RNN with GRU cells

        embedding_weights = tf.Variable(tf.random_uniform([vocabulary_size, state_size], -1.0, 1.0))
        embedding_vectors = tf.nn.embedding_lookup(embedding_weights, tokens)
    
        #First BLSTM
        cell = tf.nn.rnn_cell.GRUCell(state_size)
        cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1-dropout)
        (forward_output, backward_output), _ = \
            tf.nn.bidirectional_dynamic_rnn(cell, cell, inputs=embedding_vectors,
                                            sequence_length=lengths, dtype=tf.float32,scope='BLSTM_1')
        outputs = tf.concat([forward_output, backward_output], axis=2)
    
        #Second BLSTM using the output of previous layer as an input.
        cell2 = tf.nn.rnn_cell.GRUCell(state_size)
        cell2 = tf.nn.rnn_cell.DropoutWrapper(cell2, output_keep_prob=1-dropout)
        (forward_output, backward_output), _ = \
            tf.nn.bidirectional_dynamic_rnn(cell2, cell2, inputs=outputs,
                                            sequence_length=lengths, dtype=tf.float32,scope='BLSTM_2')
        outputs = tf.concat([forward_output, backward_output], axis=2)
    

    BTW, don't forget to add different scope name. Hope this help.

    0 讨论(0)
  • 2021-02-06 05:44

    As @Taras pointed out, you can use:

    (1) tf.nn.bidirectional_dynamic_rnn()

    (2) tf.contrib.rnn.stack_bidirectional_dynamic_rnn().

    All previous answers only capture (1), so I give some details on (2), in particular since it usually outperforms (1). For an intuition about the different connectivities see here.

    Let's say you want to create a stack of 3 BLSTM layers, each with 64 nodes:

    num_layers = 3
    num_nodes = 64
    
    
    # Define LSTM cells
    enc_fw_cells = [LSTMCell(num_nodes)for layer in range(num_layers)]
    enc_bw_cells = [LSTMCell(num_nodes) for layer in range(num_layers)]
    
    # Connect LSTM cells bidirectionally and stack
    (all_states, fw_state, bw_state) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
            cells_fw=enc_fw_cells, cells_bw=enc_bw_cells, inputs=input_embed, dtype=tf.float32)
    
    # Concatenate results
    for k in range(num_layers):
        if k == 0:
            con_c = tf.concat((fw_state[k].c, bw_state[k].c), 1)
            con_h = tf.concat((fw_state[k].h, bw_state[k].h), 1)
        else:
            con_c = tf.concat((con_c, fw_state[k].c, bw_state[k].c), 1)
            con_h = tf.concat((con_h, fw_state[k].h, bw_state[k].h), 1)
    
    output = tf.contrib.rnn.LSTMStateTuple(c=con_c, h=con_h)
    

    In this case, I use the final states of the stacked biRNN rather than the states at all timesteps (saved in all_states), since I was using an encoding decoding scheme, where the above code was only the encoder.

    0 讨论(0)
  • 2021-02-06 05:58

    You can use two different approaches to apply multilayer bilstm model:

    1) use out of previous bilstm layer as input to the next bilstm. In the beginning you should create the arrays with forward and backward cells of length num_layers. And

    for n in range(num_layers):
            cell_fw = cell_forw[n]
            cell_bw = cell_back[n]
    
            state_fw = cell_fw.zero_state(batch_size, tf.float32)
            state_bw = cell_bw.zero_state(batch_size, tf.float32)
    
            (output_fw, output_bw), last_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, output,
                                                                                 initial_state_fw=state_fw,
                                                                                 initial_state_bw=state_bw,
                                                                                 scope='BLSTM_'+ str(n),
                                                                                 dtype=tf.float32)
    
            output = tf.concat([output_fw, output_bw], axis=2)
    

    2) Also worth a look at another approach stacked bilstm.

    0 讨论(0)
提交回复
热议问题