Python Information gain implementation
问题 I am currently using scikit-learn for text classification on the 20ng dataset. I want to calculate the information gain for a vectorized dataset. It has been suggested to me that this can be accomplished, using mutual_info_classif from sklearn. However, this method is really slow, so I was trying to implement information gain myself based on this post. I came up with the following solution: from scipy.stats import entropy import numpy as np def information_gain(X, y): def _entropy(labels):