I am trying to implement a CNN + LSTM network using different images as input (not videos). I am using all layers VGG-16 with TimeDistributed layers (until its last fully connec