Keras - “Convert” a trained many-to-many model to one-to-many model (generator)

拟墨画扇 提交于 2020-01-07 04:01:08

问题


I'm trying to understand RNNs (not a specific one) with the Reber Grammar inputs (not embedded for now). You can find the jupyter notebook on this link (please disregard markdowns because I failed on the first version with output and it's not up-to-date :) ).

For every timestep, I provide the input and expected output for the training (so it's a many-to-many model).

  • Input/output are "OneHotEncoded" (based on the string "BTSXPVE") so for example

    • B is [1, 0, 0, 0, 0, 0, 0]
    • V is [0, 0, 0, 0, 0, 1, 0]
  • For the timesteps, I have string with unknown lenght (not encoded here to make it clearer) for example:

    • BPVVE
    • BPVPXVPXVPXVVE

so I decided to pad them to 20 timesteps.

  • For the batch, I'm free. I've generated 2048 encoded strings for training and 256 for test.

So my input tensor is (2048, 20, 7). My output tensor is also (2048, 20, 7) because for every timestep I would like to get the prediction.

So I trained 3 many-to-many models (Simple RNN, GRU and LSTM) like the following code.

model = Sequential()

model.add(LSTM(units=7, input_shape=(maxlen, 7), return_sequences=True))
model.compile(loss='mse',
              optimizer='Nadam',
              metrics=['mean_squared_error'])

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), 
                    epochs=1500, batch_size=1024)

As expected, for every timestep, I have the probability to get a specific value, for example (after a bit of cleanup) :

B predict [ 0, 0.622, 0, 0, 0.401, 0, 0] (60% of having a T or 40% of having P )

This is correct based on the graph to generate a word


Now, I would like to use this model to generate string (so a One-to-many model) but I have no idea how to keep the model and use it as generator.

I thought to input only the input for B (padded to 20 timesteps), get the result, concatenate the B with the best index of the output, pad it to 20 timesteps, feed the need input to the NN and so on. But I'm pretty sure this is not the way we should do it :s

Moreover, I tried to input 'B' and 'T' to check what is the probability of output (should be S or X) but I got :

X = np.array([[[1,0,0,0,0,0,0], [0,1,0,0,0,0,0]]])  # [[[B, P]]]
X = sequence.pad_sequences(X, maxlen=20)
print(model.predict(X)[0])

[0, 0.106, 0.587, 0.1, 0, 0.171, 0.007]

What I understand is that is predit that T(10%), S(60%), X(10%), V (18%) but after BT, I should get more percent on X and nearly none on V/T (because V and T after a T is only possible after B/P). It's like if my model didn't take in account the n-1 timesteps. So maybe my model is wrong :(

Many thanks for your support,


回答1:


You can remake this model as a stateful=True model. Make it work with timesteps=1 (or None for variable length).

Remaking the model:

newModel = Sequential()

newModel.add(LSTM(units=7, stateful=True,batch_input_shape=(1,1,7), return_sequences=True))

Getting the weights from the other model:

newModel.set_weights(model.get_weights())

Using the model in predictions:

Now, with this model, you must input only one step at once. And you must be careful to reset_states() every time you're going to input a new sequence:

So, suppose we've got the starting letter B.

startingLetter = oneHotForBWithShape((1,1,7))


#we are starting a new "sentence", so, let's reset states:
newModel.reset_states()

#now the prediction loop:
nextLetter = startingLetter
while nextLetter != endLetter:
    nextLetter = newModel.predict(nextLetter)
    nextLetter = chooseOneFromTheProbabilities(nextLetter)

About the quality of the results.... maybe your model is just too tiny for that.

You cay try more layers, for instance:

model = Sequential()

model.add(LSTM(units=50, input_shape=(maxlen, 7), return_sequences=True))
model.add(LSTM(units=30, return_sequences=True))
model.add(LSTM(units=7, return_sequences=True))

This choice was arbitrary, I don't know if it's good enough or too good for your data.



来源:https://stackoverflow.com/questions/46980166/keras-convert-a-trained-many-to-many-model-to-one-to-many-model-generator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!