I have a classification problem (3 classes only) and I trained a teacher-student model.
The trained student model (with knowledge distillation) performs less than the stu