问题
Computer vision and deep learning literature usually say one should use binary_crossentropy
for a binary (two-class) problem and categorical_crossentropy
for more than two classes. Now I am wondering: is there any reason to not use the latter for a two-class problem as well?
回答1:
categorical_crossentropy
:- accepts only one correct class per sample
- will take "only" the true neuron and make the crossentropy calculation with that neuron
binary_crossentropy
:- accepts many correct classes per sample
- will do the crossentropy calculation for "all neurons", considering that each neuron can be two classes, 0 and 1.
A 2-class problem can be modeled as:
- 2-neuron output with only one correct class:
softmax + categorical_crossentropy
- 1-neuron output, one class is 0, the other is 1:
sigmoid + binary_crossentropy
Explanation
Notice how in categorical crossentropy (the first equation), the term y_true
is only 1 for the true neuron, making all other neurons equal to zero.
The equation can be reduced to simply: ln(y_pred[correct_label])
.
Now notice how binary crossentropy (the second equation in the picture) has two terms, one for considering 1 as the correct class, another for considering 0 as the correct class.
来源:https://stackoverflow.com/questions/59216024/using-categorical-crossentropy-for-only-two-classes