I am doing the image semantic segmentation job with unet, if I set the Softmax Activation
for last layer like this:
...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...
and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
The training will not converge even for only one training image.
But if I do not set the Softmax Activation
for last layer like this:
...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...
and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
The training will converge for one training image.
My groundtruth dataset is generated like this:
X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
mask = cv2.imread(spath, 0)
seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))
Why? Is there something wrong for my usage?
This is my experiment code of git: https://github.com/honeytidy/unet You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.
Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False
option.
You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.
I guess the problem comes from the softmax activation function. Looking at the doc I found that sotmax is applied to the last axis by default. Can you look at model.summary()
and check if that is what you want ?
For softmax
to work properly, you must make sure that:
You are using
'channels_last'
as Keras default channel config.- This means the shapes in the model will be like
(None, height, width, channels)
- This seems to be your case because you are putting
n_classes
in the last axis. But it's also strange because you are usingConv2D
and your outputY
should be(1, height, width, n_classes)
and not that strange shape you are using.
- This means the shapes in the model will be like
Your
Y
has only zeros and ones (not 0 and 255 as usually happens to images)- Check that
Y.max() == 1
andY.min() == 0
- You may need to have
Y = Y / 255.
- Check that
Only one class is correct (your data does not have more than one path/channel with value = 1).
- Check that
(Y.sum(axis=-1) == 1).all()
isTrue
- Check that
来源:https://stackoverflow.com/questions/57253841/from-logits-true-and-from-logits-false-get-different-training-result-for-tf-loss