Keras Lstm predicting next item, taking whole sequences or sliding window. Will sliding window need stateful LSTM?

问题

I have a sequence prediction problem in which, given the last n items in a sequence I need to predict next item.

I have more than 2 million sequences each with different timesteps (length of sequence), like some are just 5 and some are 50/60/100/200 upto 500.

    seq_inputs = [
    ["AA1", "BB3", "CC4",…,"DD5"], #length/timeteps 5
    ["FF1", "DD3", "FF6","KK8","AA5", "CC8",…, "AA2"]   #length/timeteps 50
   ["AA2", "CC8", "CC11","DD3", "FF6","AA1", "BB3",……,”DD11”]#length/timesteps 200
    ..
    ..
    ] # there are 2million + of them

For prediction next item in sequence, I trim sequences to 60 maximum length with post/pre padding and just take last element of all sequences

for e.g, X’s will be

[[0,0,0,….,'AA1', 'BB3', 'CC4'],#lenght 60
 [0,0,0,….,'FF1', 'DD3', 'FF6', 'KK8', 'AA5', 'CC8'],#lenght 60
 [0,0,0,….,'AA2', 'CC8', 'CC11', 'DD3', 'FF6', 'AA1', 'BB3']#lenght 60
 ....
 ]

and y is last element

['DD5', 'AA2', 'DD11',...]

First I tokenise them and convert them in numeric form using keras tokenizer.text_to_sequence()and reshape them to 60 time steps and one feature for every sequence:**

X = [
    [[0],[0],[0],[0],[1], ..., [10], [200], [5], [3], [90] ],
    [[0],[0],[0],[0],[95],...,  [15], [4],[11],[78], [43]]
    ..
    ..
    ] 
y = [40,3, ... , ... ]

I am using LSTMs with embedding like below

model = Sequential()
model.add(Embedding(vocabulary_size, 32, input_length=seq_len)) #seq_length
model.add(LSTM(80,return_sequences=True))
..
..
model.fit(train_inputs,train_targets,epochs=50,verbose=1,batch_size=32)

For my problem of predicting next item in sequence, should this approach (of trimming sequences to 60 max length with post/pre padding and just taking last item as target) good? As target will be different timestep for each like 5th,50th,200th and so on , in my example.

Should I make every sequence n-gram/sliding window? For example for this first sequence of of my dataset

   ["AA1", "BB3", "CC4",…,"DD5"]

Sliding window of 5 , first example will be will be converted as

seq_inputs = [
   [0,0,0,0,"AA1"]
   [0,0,0,"AA1", "BB3"]
   [0,0,"AA1", "BB3","CC4"],
…,
... 
]

And similarly others will also be converted to sliding windows.

To summarise the problem and questions again:

With current approach, taking last element as y, I am struck at 30 validation accuracy, but my concern is not performance, my concern is if I am doing it right. So, Need guidance on following

Since I need to predict next item in sequence, is taking last item as output for each sequence right approach?
Since my input length varies (from 5 to 500) and I am restricting it to 60 timesteps, should I increase or decrease it?
Instead of taking whole sequence should I take sliding window approach like I shared?
Will I need to have stateful LSTM in case of sliding windows?

来源：https://stackoverflow.com/questions/64750834/keras-lstm-predicting-next-item-taking-whole-sequences-or-sliding-window-will

标签

python

keras

lstm

language-model

lstm-stateful