This question already has an answer here:
In Caffe, there is an option with its SoftmaxWithLoss function to ignore all negative labels (-1) in computing probabilities, so that only 0 or positive label probabilities add up to 1.
Is there a similar feature with Tensorflow softmax loss?
Just came up with a work-around --- I created a one-hot tensor on the label indices using tf.one_hot (with the depth set at the # of labels). tf.one_hot automatically zeros out all indices with -1 in the resulting one_hot tensor (of shape [batch, # of labels])
This enables softmax loss (i.e. tf.nn.softmax_cross_entropy_with_logits) to "ignore" all -1 labels.
I am not quite sure that your workaround is actually working.
Caffe's ignore_label
in caffe semantically has to be considered as "label of a sample which has to be ignored", thus it has as an effect that the gradient for that sampl_e is not backpropagated, which is in no way guranteed by the use of a one hot vector.
On one hand, I expect any meaningful model to quickly learn to predict a zero value, or small enough value, for that specific entry, cause of the fact all samples will have a zero in that specific entry, so to say, backpropagated info due to errors in that prediction will vanish relativly fast.
On the other hand you need to be aware that, from a math point of view caffe's ignore_label
and what you are doing are totally different.
Said this, I am new to TF and need the exact same feature as caffe's ignore_label
.