Dice loss becomes NAN after some epochs

问题

I am working on an image-segmentation application where the loss function is Dice loss. The issue is the the loss function becomes NAN after some epochs. I am doing 5-fold cross validation and checking validation and training losses for each fold. For some folds, the loss quickly becomes NAN and for some folds, it takes a while to reach it to NAN. I have inserted a constant in loss function formulation to avoid over/under-flow but still it the same problem occurs. My inputs are scaled within range [-1, 1]. I have seen people suggested using regularizers and different optimizers but I dont understand why the loss gets to NAN at first place. I have pasted the loss function, and training and validation losses for some epochs below. Initially only the validation loss and dice score for validation loss becomes NAN, but later all metrics becomes NAN.

def dice_loss(y_true, y_pred): #y_true--> ground-truth, y_pred-->predictions
smooth=1.
y_true_f = tf.keras.backend.flatten(y_true)
y_pred_f = tf.keras.backend.flatten(y_pred)
intersection = tf.keras.backend.sum(y_true_f * y_pred_f)
return 1-(2. * intersection +smooth) / (tf.keras.backend.sum(y_true_f) +
                                       tf.keras.backend.sum(y_pred_f) +smooth)

epoch   train_dice_score      train_loss    val_dice_score  val_loss
0       0.42387727            0.423877264   0.35388064      0.353880603
1       0.23064087            0.230640889   0.21502239      0.215022382
2       0.17881058            0.178810576   0.1767999       0.176799848
3       0.15746565            0.157465705   0.16138957      0.161389555
4       0.13828343            0.138283484   0.12770002      0.127699989
5       0.10434002            0.104340041   0.0981831       0.098183098
6       0.08013707            0.080137035   0.08188484      0.081884826
7       0.07081806            0.070818066   0.070421465     0.070421467
8       0.058371827           0.058371854   0.060712796     0.060712777
9       0.06381426            0.063814262   nan             nan
10      0.105625264           0.105625251   nan             nan
11      0.10790708            0.107907102   nan nan
12      0.10719114            0.10719115    nan nan

来源：https://stackoverflow.com/questions/62259112/dice-loss-becomes-nan-after-some-epochs

标签

tensorflow

keras

image-segmentation

loss