I\'m trying to predict a binary variable with both random forests and logistic regression. I\'ve got heavily unbalanced classes (approx 1.5% of Y=1).
The default feature
scoring
is just a performance evaluation tool used in test sample, and it does not enter into the internal DecisionTreeClassifier
algo at each split node. You can only specify the criterion
(kind of internal loss function at each split node) to be either gini
or information entropy
for the tree algo.
scoring
can be used in a cross-validation context where the goal is to tune some hyperparameters (like max_depth
). In your case, you can use a GridSearchCV
to tune some of your hyperparameters using the scoring function roc_auc
.