I\'m trying to train multiple models each other. There are more than two models, and each model has individual cross-entropy loss. Also, there are kl divergence loss for these m