Having a multi-label classification problem, how is the evaluation of each sample? Each individual label has its own part or it should be looked as whole? Giving an example