I used the following implementation of class-based Binary-Cross-Entropy loss, which seems to be almost constant during the training.
The class-based Binary-Cross-Entropy