I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds.
kfold
When you use cross_val_score
method, you can specify, which scorings you can calculate on each fold:
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score
scoring = {'accuracy' : make_scorer(accuracy_score),
'precision' : make_scorer(precision_score),
'recall' : make_scorer(recall_score),
'f1_score' : make_scorer(f1_score)}
kfold = model_selection.KFold(n_splits=10, random_state=42)
model=RandomForestClassifier(n_estimators=50)
results = model_selection.cross_val_score(estimator=model,
X=features,
y=labels,
cv=kfold,
scoring=scoring)
After cross validation, you will get results
dictionary with keys: 'accuracy', 'precision', 'recall', 'f1_score', which store metrics values on each fold for certain metric. For each metric you can calculate mean and std value by using np.mean(results[value])
and np.std(results[value])
, where value - one of your specified metric name.