I am reading this article (The Unreasonable Effectiveness of Recurrent Neural Networks) and want to understand how to express one-to-one, one-to-many, many-to-one, and many-to-many LSTM neural networks in Keras. I have read a lot about RNN and understand how LSTM NNs work, in particular vanishing gradient, LSTM cells, their outputs and states, sequence output and etc. However, I have trouble expressing all these concepts in Keras.
To start with I have created the following toy NN using LSTM layer
from keras.models import Model
from keras.layers import Input, LSTM
import numpy as np
t1 = Input(shape=(2, 3))
t2 = LSTM(1)(t1)
model = Model(inputs=t1, outputs=t2)
inp = np.array([[[1,2,3],[4,5,6]]])
array([[ 0.0264638]], dtype=float32)
In my example I have the input shape 2 by 3. As far as I understand this means that the input is a sequence of 2 vectors and each vector has 3 features and hence my input must be a 3D tensor of shape (n_examples, 2, 3)
. In terms of 'sequences', the input is a sequence of length 2, and each element in the sequence is expressed by 3 features (please correct me if I am wrong). When I call predict
it returns a 2-dim tensor with a single scalar. So,
Q1: Is it one-to-one or another type of LSTM network?
When we say "one/many input and one/many output"
Q2: what do we mean by "one/many input/output"? A "one/many" scalar(s), vector(s), sequence(s)..., one/many what?
Q3: Can someone give a simple working example in Keras for each type of the networks: 1-1, 1-M, M-1, and M-M?
PS: I ask multiple questions in a single thread since they are very close and related to each other.
The distinction one-to-one, one-to-many, many-to-one, many-to-many is only existent in case of RNN / LSTM or networks that work on sequential ( temporal ) data, CNNs work on spatial data there this distinction does not exist. So many always involves multiple timesteps / a sequence
The different species describe the shape of input and output and its classification. For the input one means a single input quantity is classified as a closed quantity and many means a sequence of quantities ( i.e. sequence of images, sequence of words) is classified as a closed quantity. For the output one means the output is a scalar ( binary classification i.e. is a bird or is not a bird ) 0
or 1
, many means output is a one-hot encoded vector with one dimension for each class ( multiclass classification i.e. is a sparrow, is a robin, ... ), for i.e. three classes 001, 010, 100
In the following example images and sequences of images are used as quantity that shall be classified, alternatively you could use words or ... and sequences of words ( sentences ) or ... :
one-to-one : single images ( or words,... ) are classified in single class ( binary classification ) i.e. is this a bird or not
one-to-many : single images ( or words,... ) are classified in multiple classes
many-to-one : sequence of images ( or words, ... ) is classified in single class ( binary classification of a sequence )
many-to-many : sequence of images ( or words, ... ) is classified in multiple classes
cf https://www.quora.com/How-can-I-choose-between-one-to-one-one-to-many-many-to-one-many-to-one-and-many-to-many-in-long-short-term-memory-LSTM
one-to-one ( activation=sigmoid
( default ) loss=mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(len(seq), 1, 1)
y = seq.reshape(len(seq), 1)
# define LSTM configuration
n_neurons = length
n_batch = length
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(1, 1)))
model.compile(loss='mean_squared_error', optimizer='adam')
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result:
print('%.1f' % value)
source : https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/
one-to-many uses RepeatVector()
to transform single quantities into a sequence what is needed for multiclass classification
def test_one_to_many(self):
params = dict(
input_dims=[1, 10], activation='tanh',
return_sequences=False, output_dim=3
number_of_times = 4
model = Sequential()
model.add(RepeatVector(number_of_times, input_shape=(10,)))
relative_error, keras_preds, coreml_preds = simple_model_eval(params, model)
# print relative_error, '\n', keras_preds, '\n', coreml_preds, '\n'
for i in range(len(relative_error)):
self.assertLessEqual(relative_error[i], 0.01)
source: https://www.programcreek.com/python/example/89689/keras.layers.RepeatVector
alternative one-to-many
model.add(RepeatVector(number_of_times, input_shape=input_shape))
model.add(LSTM(output_size, return_sequences=True))
source : Many to one and many to many LSTM examples in Keras
many-to-one, binary classification (loss=binary_crossentropy
, activation=sigmoid
, dimensionality of fully-connected ouput layer is 1 (Dense(1)
), outputs a scalar, 0
or 1
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
many-to-many, multiclass classification ( loss=sparse_categorial_crossentropy
, activation=softmax
, needs one-hot encoding of target, ground truth data, dimensionality of fully-connected ouput layer is 7 (Dense71)
) outputs a 7-dimensional vector in that the 7 classes are one-hot encoded )
from keras.models import Sequential
from keras.layers import *
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(7, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
cf Keras LSTM multiclass classification
Alternative many-to-many using TimeDistributed
layer cf https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/ for description
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.compile(loss='mean_squared_error', optimizer='adam')
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
print('%.1f' % value)
source : https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/