implementing RNN with numpy

我只是一个虾纸丫 提交于 2019-12-22 04:04:35

问题


I'm trying to implement the recurrent neural network with numpy.

My current input and output designs are as follow:

x is of shape: (sequence length, batch size, input dimension)

h : (number of layers, number of directions, batch size, hidden size)

initial weight: (number of directions, 2 * hidden size, input size + hidden size)

weight: (number of layers -1, number of directions, hidden size, directions*hidden size + hidden size)

bias: (number of layers, number of directions, hidden size)

I have looked up pytorch API of RNN the as reference (https://pytorch.org/docs/stable/nn.html?highlight=rnn#torch.nn.RNN), but have slightly changed it to include initial weight as input. (output shapes are supposedly the same as in pytorch)

While it is running, I cannot determine whether it is behaving right, as I am inputting randomly generated numbers as input.

In particular, I am not so certain whether my input shapes are designed correctly.

Could any expert give me a guidance?

def rnn(xs, h, w0, w=None, b=None, num_layers=2, nonlinearity='tanh', dropout=0.0, bidirectional=False, training=True):
    num_directions = 2 if bidirectional else 1
    batch_size = xs.shape[1]
    input_size = xs.shape[2]
    hidden_size = h.shape[3]
    hn = []
    y = [None]*len(xs)

    for l in range(num_layers):
        for d in range(num_directions):
            if l==0 and d==0:
                wi = w0[d, :hidden_size,  :input_size].T
                wh = w0[d, hidden_size:,  input_size:].T
                wi = np.reshape(wi, (1,)+wi.shape)
                wh = np.reshape(wh, (1,)+wh.shape)
            else:
                wi = w[max(l-1,0), d, :,  :hidden_size].T
                wh = w[max(l-1,0), d, :,  hidden_size:].T
            for i,x in enumerate(xs):
                if l==0 and d==0:
                    ht = np.tanh(np.dot(x, wi) + np.dot(h[l, d], wh) + b[l, d][np.newaxis])
                    ht = np.reshape(ht,(batch_size, hidden_size)) #otherwise, shape is (bs,1,hs)
                else:
                    ht = np.tanh(np.dot(y[i], wi) + np.dot(h[l, d], wh) + b[l, d][np.newaxis])
                y[i] = ht
            hn.append(ht)
    y = np.asarray(y)
    y = np.reshape(y, y.shape+(1,))
    return np.asarray(y), np.asarray(hn)

回答1:


Regarding the shape, it probably makes sense if that's how PyTorch does it, but the Tensorflow way is a bit more intuitive - (batch_size, seq_length, input_size) - batch_size sequences of seq_length length where each element has input_size size. Both approaches can work, so I guess it's a matter of preferences.

To see whether your rnn is behaving appropriately, I'd just print the hidden state at each time step, run it on some small random data (e.g. 5 vectors, 3 elements each) and compare the results with your manual calculations.

Looking at your code, I'm unsure if it does what it's supposed to, but instead of doing this on your own based on an existing API, I'd recommend you read and try to replicate this awesome tutorial from wildml (in part 2 there's a pure numpy implementation).



来源:https://stackoverflow.com/questions/51465813/implementing-rnn-with-numpy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!