from_logits=True and from_logits=False get different training result for tf.losses.CategoricalCrossentropy for UNet

I am doing the image semantic segmentation job with unet, if I set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) The training will not converge even for only one training image.

But if I do not set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) The training will converge for one training image.

My groundtruth dataset is generated like this:

X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
    mask = cv2.imread(spath, 0)
    seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))

Why? Is there something wrong for my usage?

This is my experiment code of git: https://github.com/honeytidy/unet You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.

Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False option.

You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.

I guess the problem comes from the softmax activation function. Looking at the doc I found that sotmax is applied to the last axis by default. Can you look at model.summary() and check if that is what you want ?

For softmax to work properly, you must make sure that:

You are using 'channels_last' as Keras default channel config.
- This means the shapes in the model will be like (None, height, width, channels)
- This seems to be your case because you are putting n_classes in the last axis. But it's also strange because you are using Conv2D and your output Y should be (1, height, width, n_classes) and not that strange shape you are using.
Your Y has only zeros and ones (not 0 and 255 as usually happens to images)
- Check that Y.max() == 1 and Y.min() == 0
- You may need to have Y = Y / 255.
Only one class is correct (your data does not have more than one path/channel with value = 1).
- Check that (Y.sum(axis=-1) == 1).all() is True

来源：https://stackoverflow.com/questions/57253841/from-logits-true-and-from-logits-false-get-different-training-result-for-tf-loss

标签

python

tensorflow

keras

image-segmentation

tf.keras