Converting state-parameters of Pytorch LSTM to Keras LSTM

前端未结

关注

 1  1893

醉梦人生

I was trying to port an existing trained PyTorch model into Keras.

During the porting, I got stuck at LSTM layer.

Keras implementation of LSTM network seems to h

相关标签:

1条回答

耶瑟儿～

2021-02-14 09:17

They are really not that different. If you sum up the two bias vectors in PyTorch, the equations will be the same as what's implemented in Keras.

This is the LSTM formula on PyTorch documentation:

PyTorch uses two separate bias vectors for the input transformation (with a subscript starts with i) and recurrent transformation (with a subscript starts with h).

In Keras LSTMCell:

        x_i = K.dot(inputs_i, self.kernel_i)
        x_f = K.dot(inputs_f, self.kernel_f)
        x_c = K.dot(inputs_c, self.kernel_c)
        x_o = K.dot(inputs_o, self.kernel_o)
        if self.use_bias:
            x_i = K.bias_add(x_i, self.bias_i)
            x_f = K.bias_add(x_f, self.bias_f)
            x_c = K.bias_add(x_c, self.bias_c)
            x_o = K.bias_add(x_o, self.bias_o)

        if 0 < self.recurrent_dropout < 1.:
            h_tm1_i = h_tm1 * rec_dp_mask[0]
            h_tm1_f = h_tm1 * rec_dp_mask[1]
            h_tm1_c = h_tm1 * rec_dp_mask[2]
            h_tm1_o = h_tm1 * rec_dp_mask[3]
        else:
            h_tm1_i = h_tm1
            h_tm1_f = h_tm1
            h_tm1_c = h_tm1
            h_tm1_o = h_tm1
        i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
                                                  self.recurrent_kernel_i))
        f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
                                                  self.recurrent_kernel_f))
        c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c,
                                                        self.recurrent_kernel_c))
        o = self.recurrent_activation(x_o + K.dot(h_tm1_o,
                                                  self.recurrent_kernel_o))

There's only one bias added in the input transformation. However, the equations would be equivalent if we sum up the two biases in PyTorch.

The two-bias LSTM is what's implemented in cuDNN (see the developer guide). I'm really not that familiar with PyTorch, but I guess that's why they use two bias parameters. In Keras, the CuDNNLSTM layer also has two bias weight vectors.

0 讨论(0)