问题
I'm trying to understand RNNs (not a specific one) with the Reber Grammar inputs (not embedded for now). You can find the jupyter notebook on this link (please disregard markdowns because I failed on the first version with output and it's not up-to-date :) ).
For every timestep, I provide the input and expected output for the training (so it's a many-to-many model).
Input/output are "OneHotEncoded" (based on the string "BTSXPVE") so for example
- B is [1, 0, 0, 0, 0, 0, 0]
- V is [0, 0, 0, 0, 0, 1, 0]
For the timesteps, I have string with unknown lenght (not encoded here to make it clearer) for example:
- BPVVE
- BPVPXVPXVPXVVE
so I decided to pad them to 20 timesteps.
- For the batch, I'm free. I've generated 2048 encoded strings for training and 256 for test.
So my input tensor is (2048, 20, 7). My output tensor is also (2048, 20, 7) because for every timestep I would like to get the prediction.
So I trained 3 many-to-many models (Simple RNN, GRU and LSTM) like the following code.
model = Sequential()
model.add(LSTM(units=7, input_shape=(maxlen, 7), return_sequences=True))
model.compile(loss='mse',
optimizer='Nadam',
metrics=['mean_squared_error'])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test),
epochs=1500, batch_size=1024)
As expected, for every timestep, I have the probability to get a specific value, for example (after a bit of cleanup) :
B predict [ 0, 0.622, 0, 0, 0.401, 0, 0] (60% of having a T or 40% of having P )
This is correct based on the graph to generate a word
Now, I would like to use this model to generate string (so a One-to-many model) but I have no idea how to keep the model and use it as generator.
I thought to input only the input for B (padded to 20 timesteps), get the result, concatenate the B with the best index of the output, pad it to 20 timesteps, feed the need input to the NN and so on. But I'm pretty sure this is not the way we should do it :s
Moreover, I tried to input 'B' and 'T' to check what is the probability of output (should be S or X) but I got :
X = np.array([[[1,0,0,0,0,0,0], [0,1,0,0,0,0,0]]]) # [[[B, P]]]
X = sequence.pad_sequences(X, maxlen=20)
print(model.predict(X)[0])
[0, 0.106, 0.587, 0.1, 0, 0.171, 0.007]
What I understand is that is predit that T(10%), S(60%), X(10%), V (18%) but after BT, I should get more percent on X and nearly none on V/T (because V and T after a T is only possible after B/P). It's like if my model didn't take in account the n-1 timesteps. So maybe my model is wrong :(
Many thanks for your support,
回答1:
You can remake this model as a stateful=True
model. Make it work with timesteps=1
(or None
for variable length).
Remaking the model:
newModel = Sequential()
newModel.add(LSTM(units=7, stateful=True,batch_input_shape=(1,1,7), return_sequences=True))
Getting the weights from the other model:
newModel.set_weights(model.get_weights())
Using the model in predictions:
Now, with this model, you must input only one step at once. And you must be careful to reset_states()
every time you're going to input a new sequence:
So, suppose we've got the starting letter B
.
startingLetter = oneHotForBWithShape((1,1,7))
#we are starting a new "sentence", so, let's reset states:
newModel.reset_states()
#now the prediction loop:
nextLetter = startingLetter
while nextLetter != endLetter:
nextLetter = newModel.predict(nextLetter)
nextLetter = chooseOneFromTheProbabilities(nextLetter)
About the quality of the results.... maybe your model is just too tiny for that.
You cay try more layers, for instance:
model = Sequential()
model.add(LSTM(units=50, input_shape=(maxlen, 7), return_sequences=True))
model.add(LSTM(units=30, return_sequences=True))
model.add(LSTM(units=7, return_sequences=True))
This choice was arbitrary, I don't know if it's good enough or too good for your data.
来源:https://stackoverflow.com/questions/46980166/keras-convert-a-trained-many-to-many-model-to-one-to-many-model-generator