这节内容介绍几个经典的RNN模型,有GRU/LSTM/深度循环神经网络/双向循环神经网络。在吴恩达老师的DL课程中,曾说过GRU是LSTM的改进,GRU单元更加简单,构筑大型网络结构更有优势,而LSTM单元功能更加强大;深度循环神经网络就是在一个RNN单元中层数更深,拥有更强的抽取特征的能力;双向循环神经网络在机器翻译任务中十分有效,如对某个词的词义的确定既需要前面单词也需要后面单词的配合。另外值得一提的是,这些网络拥有长期记忆的方法,可以避免梯度消失的问题。
GRU
GRU的重置门号称能记录长期记忆,其实主要借助于激活函数sigmoid,会生成一个[0,1]之间的数,越靠近0,表示越需要遗忘。
def gru(inputs, state, params):
W_xz, W_hz, b_z, W_xr, W_hr, b_r, W_xh, W_hh, b_h, W_hq, b_q = params
H, = state
outputs = []
for X in inputs:
Z = torch.sigmoid(torch.matmul(X, W_xz) + torch.matmul(H, W_hz) + b_z)
R = torch.sigmoid(torch.matmul(X, W_xr) + torch.matmul(H, W_hr) + b_r)
H_tilda = torch.tanh(torch.matmul(X, W_xh) + R * torch.matmul(H, W_hh) + b_h)
H = Z * H + (1 - Z) * H_tilda
Y = torch.matmul(H, W_hq) + b_q
outputs.append(Y)
return outputs, (H,)
LSTM
def lstm(inputs, state, params):
[W_xi, W_hi, b_i, W_xf, W_hf, b_f, W_xo, W_ho, b_o, W_xc, W_hc, b_c, W_hq, b_q] = params
(H, C) = state
outputs = []
for X in inputs:
I = torch.sigmoid(torch.matmul(X, W_xi) + torch.matmul(H, W_hi) + b_i)
F = torch.sigmoid(torch.matmul(X, W_xf) + torch.matmul(H, W_hf) + b_f)
O = torch.sigmoid(torch.matmul(X, W_xo) + torch.matmul(H, W_ho) + b_o)
C_tilda = torch.tanh(torch.matmul(X, W_xc) + torch.matmul(H, W_hc) + b_c)
C = F * C + I * C_tilda
H = O * C.tanh()
Y = torch.matmul(H, W_hq) + b_q
outputs.append(Y)
return outputs, (H, C)
深度循环神经网络
num_hiddens=256
num_epochs, num_steps, batch_size, lr, clipping_theta = 160, 35, 32, 1e2, 1e-2
pred_period, pred_len, prefixes = 40, 50, ['分开', '不分开']
lr = 1e-2 # 注意调整学习率
# LSTM单元
gru_layer = nn.LSTM(input_size=vocab_size, hidden_size=num_hiddens,num_layers=2) #num_layers为DRNN的层数
model = d2l.RNNModel(gru_layer, vocab_size).to(device)
d2l.train_and_predict_rnn_pytorch(model, num_hiddens, vocab_size, device,
corpus_indices, idx_to_char, char_to_idx,
num_epochs, num_steps, lr, clipping_theta,
batch_size, pred_period, pred_len, prefixes)
双向循环神经网络
这里提一下,concat操作在前面几种网络中也有用到,拼接特征非常好用。
num_hiddens=128
num_epochs, num_steps, batch_size, lr, clipping_theta = 160, 35, 32, 1e-2, 1e-2
pred_period, pred_len, prefixes = 40, 50, ['分开', '不分开']
lr = 1e-2 # 注意调整学习率
gru_layer = nn.GRU(input_size=vocab_size, hidden_size=num_hiddens,bidirectional=True)
model = d2l.RNNModel(gru_layer, vocab_size).to(device)
d2l.train_and_predict_rnn_pytorch(model, num_hiddens, vocab_size, device,
corpus_indices, idx_to_char, char_to_idx,
num_epochs, num_steps, lr, clipping_theta,
batch_size, pred_period, pred_len, prefixes)
有些话说
研究RNN不是我的菜,所以简单提点问题:
- LSTM/GRU这些单元是如何长期记忆的?
- concat操作的妙处?
来源:CSDN
作者:蓝胖子先生
链接:https://blog.csdn.net/gongsai20141004277/article/details/104363410