For a discrete random variable X, how is the entropy of X defined? Let f be the function mapping input X to labels. You should try to maximize H(f(X)). How can we write a