I am training a small network and the training seems to go fine, the val loss decreases, I reach validation accuracy around 80, and it actually stops training once there is no m
One of the problems that could lead to such behavior is imbalanced dataset. Your model found out that if it predicts the dominant class each time, it would get a good results.
There are many ways to tackle an imbalance dataset. Here is a good tutorial.
One of the easiest yet powerful solution is to apply higher penalty to your loss if it wrongly predicted the smaller class. This can be implemented in keras by setting the parameter class_weight
in the fit
or fit_generator
function.
It can be a dictionary of example:
class_weight = {0: 0.75, 1: 0.25} # does not necessarily add to up 1.
history = model.fit_generator(train_generator,
steps_per_epoch=train_generator.n // train_generator.batch_size,
epochs=epochs,
class_weight= class_weight, # this is the important part
validation_data=val_generator,
validation_steps=val_generator.n // val_generator.batch_size,
callbacks=[earlyStopping, mcp_save]) #, reduce_lr_loss])
Adding to Coderji's answer, it might also prove advantageous to counter class imbalance using stratified k-fold cross-validation, with k = 5
being common practice. This basically splits your data set up into k
splits like regular cross-validation, but also stratifies these splits. In the case of class imbalance, each of these splits contain over-/undersampled classes compensating for their lower/higher occurence within the data set.
As of yet Keras does not have it's own way to use stratified k
-fold cross-validation. Instead it's advised to use sklearn's StratifiedKFold. This article gives a detailed overview how to achieve this in Keras,
with the gist of it being:
from sklearn.model_selection import StratifiedKFold# Instantiate the cross validator
skf = StratifiedKFold(n_splits=kfold_splits, shuffle=True)# Loop through the indices the split() method returns
for index, (train_indices, val_indices) in enumerate(skf.split(X, y)):
print "Training on fold " + str(index+1) + "/10..." # Generate batches from indices
xtrain, xval = X[train_indices], X[val_indices]
ytrain, yval = y[train_indices], y[val_indices] # Clear model, and create it
model = None
model = create_model()
# Debug message I guess
# print "Training new iteration on " + str(xtrain.shape[0]) + " training samples, " + str(xval.shape[0]) + " validation samples, this may be a while..."
history = train_model(model, xtrain, ytrain, xval, yval)
accuracy_history = history.history['acc']
val_accuracy_history = history.history['val_acc']
print "Last training accuracy: " + str(accuracy_history[-1]) + ", last validation accuracy: " + str(val_accuracy_history[-1])
create_model()
returns a compiled Keras modeltrain_model()
returns last history object of its last model.fit()
operation