问题
I am working on a fraud analytics project and I need some help with boosting. Previously, I used SAS Enterprise Miner to learn more about boosting/ensemble techniques and I learned that boosting can help to improve the performance of a model.
Currently, my group have completed the following models on Python: Naive Bayes, Random Forest, and Neural Network We want to use XGBoost to make the F1-score better. I am not sure if this is possible since I only come across tutorials on how to do XGBoost or Naive Bayes on its own.
I am looking for a tutorial where they will show you how to create a Naive Bayes model and then use boosting. After that, we can compare the metrics with and without boosting to see if it improved. I am relatively new to machine learning so I could be wrong about this concept.
I thought of replacing the values in the XGBoost but not sure which one to change or if it can even work this way.
Naive Bayes
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_sm,y_sm, test_size = 0.2, random_state=0)
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, confusion_matrix, accuracy_score, f1_score, precision_score, recall_score
nb = GaussianNB()
nb.fit(X_train, y_train)
nb_pred = nb.predict(X_test)
XGBoost
from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_sm,y_sm, test_size = 0.2, random_state=0)
model = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.9, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=500, n_jobs=-1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=0.9, verbosity=0)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
回答1:
In theory, boosting any (base) classifier is easy and straightforward with scikit-learn's AdaBoostClassifier. E.g. for a Naive Bayes classifier, it should be:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()
model = AdaBoostClassifier(base_estimator=nb, n_estimators=10)
model.fit(X_train, y_train)
and so on.
In practice, we never use Naive Bayes or Neural Nets as base classifiers for boosting (let alone Random Forests, which are themselves an ensemble method).
Adaboost (and similar boosting methods that have been derived afterwards, like GBM and XGBoost) was conceived using decision trees (DTs) as base classifiers (more specifically, decision stumps, i.e. DTs with a depth of only 1); there is good reason why still today, if you don't specify explicitly the base_classifier
argument in scikit-learn's AdaBoostClassifier
above, it assumes a value of DecisionTreeClassifier(max_depth=1)
, i.e. a decision stump.
DTs are suitable for such ensembling because they are essentially unstable classifiers, which is not the case with the other algorithms mentioned, hence the latter are not expected to offer anything when used as base classifiers for boosting algorithms.
来源:https://stackoverflow.com/questions/58572881/can-i-use-xgboost-to-boost-other-models-eg-naive-bayes-random-forest