问题
I have a sequence prediction problem in which, given the last n
items in a sequence I need to predict next item.
I have more than 2 million sequences each with different timesteps
(length of sequence
), like some are just 5 and some are 50/60/100/200 upto 500.
seq_inputs = [
["AA1", "BB3", "CC4",…,"DD5"], #length/timeteps 5
["FF1", "DD3", "FF6","KK8","AA5", "CC8",…, "AA2"] #length/timeteps 50
["AA2", "CC8", "CC11","DD3", "FF6","AA1", "BB3",……,”DD11”]#length/timesteps 200
..
..
] # there are 2million + of them
For prediction next item in sequence, I trim
sequences to 60 maximum length with post/pre padding
and just take last element of all sequences
for e.g, X’s will be
[[0,0,0,….,'AA1', 'BB3', 'CC4'],#lenght 60
[0,0,0,….,'FF1', 'DD3', 'FF6', 'KK8', 'AA5', 'CC8'],#lenght 60
[0,0,0,….,'AA2', 'CC8', 'CC11', 'DD3', 'FF6', 'AA1', 'BB3']#lenght 60
....
]
and y is last element
['DD5', 'AA2', 'DD11',...]
First I tokenise them and convert them in numeric form using keras tokenizer.text_to_sequence()
and reshape them to 60 time steps and one feature for every sequence:**
X = [
[[0],[0],[0],[0],[1], ..., [10], [200], [5], [3], [90] ],
[[0],[0],[0],[0],[95],..., [15], [4],[11],[78], [43]]
..
..
]
y = [40,3, ... , ... ]
I am using LSTMs with embedding like below
model = Sequential()
model.add(Embedding(vocabulary_size, 32, input_length=seq_len)) #seq_length
model.add(LSTM(80,return_sequences=True))
..
..
model.fit(train_inputs,train_targets,epochs=50,verbose=1,batch_size=32)
For my problem of predicting next item in sequence, should this approach (of trimming sequences to 60 max length with post/pre padding and just taking last item as target) good? As target will be different timestep for each like 5th,50th,200th and so on , in my example.
Should I make every sequence n-gram/sliding window? For example for this first sequence of of my dataset
["AA1", "BB3", "CC4",…,"DD5"]
Sliding window of 5 , first example will be will be converted as
seq_inputs = [
[0,0,0,0,"AA1"]
[0,0,0,"AA1", "BB3"]
[0,0,"AA1", "BB3","CC4"],
…,
...
]
And similarly others will also be converted to sliding windows.
To summarise the problem and questions again:
With current approach, taking last element as y
, I am struck at 30 validation accuracy, but my concern is not performance, my concern is if I am doing it right. So, Need guidance on following
- Since I need to predict next item in sequence, is taking last item as output for each sequence right approach?
- Since my input length varies (from 5 to 500) and I am restricting it to 60 timesteps, should I increase or decrease it?
- Instead of taking whole sequence should I take sliding window approach like I shared?
- Will I need to have stateful LSTM in case of sliding windows?
来源:https://stackoverflow.com/questions/64750834/keras-lstm-predicting-next-item-taking-whole-sequences-or-sliding-window-will