问题
My neural net is slightly modified version of model proposed on this paper: https://arxiv.org/pdf/1606.01781.pdf
My goal is to classify text to 9 different categories. I'm using 29 convolutional layers and have set the max length of any text to 256 characters.
Training data has 900k and validation data 35k samples. The data is quite imbalanced and therefore I have done some data augmentation to balance the training data (have not touched the validation data obviously) and then used class weights in training.
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 256) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 256, 16) 1152
_________________________________________________________________
conv1d_1 (Conv1D) (None, 256, 64) 3136
_________________________________________________________________
sequential_1 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_2 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_3 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_4 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_5 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_6 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_7 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_8 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_9 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_10 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 128, 64) 0
_________________________________________________________________
sequential_11 (Sequential) (None, 128, 128) 75008
_________________________________________________________________
sequential_12 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_13 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_14 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_15 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_16 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_17 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_18 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_19 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_20 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 64, 128) 0
_________________________________________________________________
sequential_21 (Sequential) (None, 64, 256) 297472
_________________________________________________________________
sequential_22 (Sequential) (None, 64, 256) 395776
_________________________________________________________________
sequential_23 (Sequential) (None, 64, 256) 395776
_________________________________________________________________
sequential_24 (Sequential) (None, 64, 256) 395776
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 32, 256) 0
_________________________________________________________________
sequential_25 (Sequential) (None, 32, 512) 1184768
_________________________________________________________________
sequential_26 (Sequential) (None, 32, 512) 1577984
_________________________________________________________________
sequential_27 (Sequential) (None, 32, 512) 1577984
_________________________________________________________________
sequential_28 (Sequential) (None, 32, 512) 1577984
_________________________________________________________________
lambda_1 (Lambda) (None, 4096) 0
_________________________________________________________________
dense_1 (Dense) (None, 2048) 8390656
_________________________________________________________________
dense_2 (Dense) (None, 2048) 4196352
_________________________________________________________________
dense_3 (Dense) (None, 9) 18441
=================================================================
Total params: 21,236,681
Trainable params: 21,216,713
Non-trainable params: 19,968
With given model I'm receiving the following results:
For me, the loss curves look weirdish, because I can't spot the typical overfitting effect on the curves, but still the difference between training and validation loss is huge. Also the training loss is way lower at epoch#1 than validation loss is at any epoch.
Is this something I should be worried about and how could I improve my model?
Thanks!
回答1:
To close the gap between training and validation error, i would suggest two things:
- try using more data, if you can find
- use dropout layers in between dense layers for regularization
来源:https://stackoverflow.com/questions/51023032/deep-learning-big-difference-between-training-and-validation-loss-from-the-very