Converting state-parameters of Pytorch LSTM to Keras LSTM

前端 未结 1 1890
醉梦人生
醉梦人生 2021-02-14 08:47

I was trying to port an existing trained PyTorch model into Keras.

During the porting, I got stuck at LSTM layer.

Keras implementation of LSTM network seems to h

相关标签:
1条回答
  • 2021-02-14 09:17

    They are really not that different. If you sum up the two bias vectors in PyTorch, the equations will be the same as what's implemented in Keras.

    This is the LSTM formula on PyTorch documentation:

    PyTorch uses two separate bias vectors for the input transformation (with a subscript starts with i) and recurrent transformation (with a subscript starts with h).

    In Keras LSTMCell:

            x_i = K.dot(inputs_i, self.kernel_i)
            x_f = K.dot(inputs_f, self.kernel_f)
            x_c = K.dot(inputs_c, self.kernel_c)
            x_o = K.dot(inputs_o, self.kernel_o)
            if self.use_bias:
                x_i = K.bias_add(x_i, self.bias_i)
                x_f = K.bias_add(x_f, self.bias_f)
                x_c = K.bias_add(x_c, self.bias_c)
                x_o = K.bias_add(x_o, self.bias_o)
    
            if 0 < self.recurrent_dropout < 1.:
                h_tm1_i = h_tm1 * rec_dp_mask[0]
                h_tm1_f = h_tm1 * rec_dp_mask[1]
                h_tm1_c = h_tm1 * rec_dp_mask[2]
                h_tm1_o = h_tm1 * rec_dp_mask[3]
            else:
                h_tm1_i = h_tm1
                h_tm1_f = h_tm1
                h_tm1_c = h_tm1
                h_tm1_o = h_tm1
            i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
                                                      self.recurrent_kernel_i))
            f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
                                                      self.recurrent_kernel_f))
            c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c,
                                                            self.recurrent_kernel_c))
            o = self.recurrent_activation(x_o + K.dot(h_tm1_o,
                                                      self.recurrent_kernel_o))
    

    There's only one bias added in the input transformation. However, the equations would be equivalent if we sum up the two biases in PyTorch.

    The two-bias LSTM is what's implemented in cuDNN (see the developer guide). I'm really not that familiar with PyTorch, but I guess that's why they use two bias parameters. In Keras, the CuDNNLSTM layer also has two bias weight vectors.

    0 讨论(0)
提交回复
热议问题