问题
I have a binary classification problem. I use the following keras model to do my classification.
input1 = Input(shape=(25,6))
x1 = LSTM(200)(input1)
input2 = Input(shape=(24,6))
x2 = LSTM(200)(input2)
input3 = Input(shape=(21,6))
x3 = LSTM(200)(input3)
input4 = Input(shape=(20,6))
x4 = LSTM(200)(input4)
x = concatenate([x1,x2,x3,x4])
x = Dropout(0.2)(x)
x = Dense(200)(x)
x = Dropout(0.2)(x)
output = Dense(1, activation='sigmoid')(x)
However, the results I get is extremely bad. I thought the reason is that I have too many features, thus, needs have more improved layers after the concatenate
.
I was also thinking if it would be helpful to used a flatten() layer after the concatenate
.
anyway, since I am new to deep learning, I am not so sure how to make this a better model.
I am happy to provide more details if needed.
回答1:
Here is what I can suggest
Remove every things that prevent overfitting, such as Dropout and regularizer. What can happen is that your model may not be able to capture the complexity of your data using given layer, so you need to make sure that your model is able to overfit first before adding regularizer.
Now try increase number of Dense layer and number of neuron in each layer until you can see some improvement. There is also a possibility that your data is too noisy or you have only few data to train the model so you can't even produce a useful predictions.
Now if you are LUCKY and you can see overfitting, you can add Dropout and regularizer.
Because every neural network is a gradient base algorithm, you may end up at local minimum. You may also need to run the algorithm multiple times with different initial weight before you can get a good result or You can change your loss function so that you have a convex problem where local minimum is global minimum.
If you can't achieve better result
You may need to try different topology because LSTM is just trying to model a system that assume to have Markov property. you can look at nested-LSTM or something like that, which model the system in the way that next time step is not just depend on current time step.
回答2:
The Dropout
right before the output layer could be problematic. I would suggest removing both Dropout
layers and evaluating performance, then re-introduce regularization once the model is performing well on the the training set.
来源:https://stackoverflow.com/questions/60859157/how-to-handle-lstms-with-many-features-in-python