问题
I am working on an image-segmentation application where the loss function is Dice loss. The issue is the the loss function becomes NAN after some epochs. I am doing 5-fold cross validation and checking validation and training losses for each fold. For some folds, the loss quickly becomes NAN and for some folds, it takes a while to reach it to NAN. I have inserted a constant in loss function formulation to avoid over/under-flow but still it the same problem occurs. My inputs are scaled within range [-1, 1]. I have seen people suggested using regularizers and different optimizers but I dont understand why the loss gets to NAN at first place. I have pasted the loss function, and training and validation losses for some epochs below. Initially only the validation loss and dice score for validation loss becomes NAN, but later all metrics becomes NAN.
def dice_loss(y_true, y_pred): #y_true--> ground-truth, y_pred-->predictions
smooth=1.
y_true_f = tf.keras.backend.flatten(y_true)
y_pred_f = tf.keras.backend.flatten(y_pred)
intersection = tf.keras.backend.sum(y_true_f * y_pred_f)
return 1-(2. * intersection +smooth) / (tf.keras.backend.sum(y_true_f) +
tf.keras.backend.sum(y_pred_f) +smooth)
epoch train_dice_score train_loss val_dice_score val_loss
0 0.42387727 0.423877264 0.35388064 0.353880603
1 0.23064087 0.230640889 0.21502239 0.215022382
2 0.17881058 0.178810576 0.1767999 0.176799848
3 0.15746565 0.157465705 0.16138957 0.161389555
4 0.13828343 0.138283484 0.12770002 0.127699989
5 0.10434002 0.104340041 0.0981831 0.098183098
6 0.08013707 0.080137035 0.08188484 0.081884826
7 0.07081806 0.070818066 0.070421465 0.070421467
8 0.058371827 0.058371854 0.060712796 0.060712777
9 0.06381426 0.063814262 nan nan
10 0.105625264 0.105625251 nan nan
11 0.10790708 0.107907102 nan nan
12 0.10719114 0.10719115 nan nan
来源:https://stackoverflow.com/questions/62259112/dice-loss-becomes-nan-after-some-epochs